Array for detecting microbes

ABSTRACT

The present embodiments relate to an array system for detecting and identifying biomolecules and organisms. More specifically, the present embodiments relate to an array system comprising a microarray configured to simultaneously detect a plurality of organisms in a sample at a high confidence level.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims the benefit ofpriority to, PCT Application No. PCT/US2007/024720, filed Nov. 29, 2007,which was written in English, published in English as WO/2008/130394 anddesignated the United States of America, which claims priority under 35U.S.C. § 119(e) to U.S. Provisional Application No. 60/861,834 filedNov. 30, 2006, both of which are hereby incorporated by reference intheir entirety.

STATEMENT REGARDING FEDERALLY SPONSORED R&D

This invention was made with Government support under Grant No.DE-AC03-76SF00098 from the Department of Homeland Security and ContractNo. DE-ACO-05CH11231 from the Department of Energy.

REFERENCE TO SEQUENCE LISTING

The present application is being filed along with a Sequence Listing inelectronic format. The Sequence Listing is provided as a file entitledLBNL026C1.TXT, created May 28, 2009, which is 5.51 KB in size. Theinformation in the electronic format of the Sequence Listing isincorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present embodiments relate to an array system for detecting andidentifying biomolecules and organisms. More specifically, the presentembodiments relate to an array system comprising a microarray configuredto simultaneously detect a plurality of organisms in a sample at a highconfidence level.

2. Description of the Related Art

In the fields of molecular biology and biochemistry, biopolymers such asnucleic acids and proteins from organisms are identified and/orfractionated in order to search for useful genes, diagnose diseases oridentify organisms. A hybridization reaction is frequently used as apretreatment for such process, where a target molecule in a sample ishybridized with a nucleic acid or a protein having a known sequence. Forthis purpose, microarrays, or DNA chips, are used on which probes suchas DNAs, RNAs or proteins with known sequences are immobilized atpredetermined positions.

A DNA microarray (also commonly known as gene or genome chip, DNA chip,or gene array) is a collection of microscopic DNA spots attached to asolid surface, such as glass, plastic or silicon chip forming an array.The affixed DNA segments are known as probes (although some sources willuse different nomenclature), thousands of which can be used in a singleDNA microarray. Measuring gene expression using microarrays is relevantto many areas of biology and medicine, such as studying treatments,disease, and developmental stages. For example, microarrays can be usedto identify disease genes by comparing gene expression in diseased andnormal cells.

Molecular approaches designed to describe organism diversity routinelyrely upon classifying heterogeneous nucleic acids amplified by universal16S RNA gene PCR (polymerase chain reaction). The resulting mixedamplicons can be quickly, but coarsely, typed into anonymous groupsusing T-/RFLP (Terminal Restriction Fragment Length Polymorphism), SSCP(single-strand conformation polymorphism) or T/DGGE(temperature/denaturing gradient gel. electrophoresis). These groups maybe classified through sequencing, but this requires additional labor tophysically isolate each 16S RNA type, does not scale well for largecomparative studies such as environmental monitoring, and is onlysuitable for low complexity environments. Also, the number of clonesthat would be required to adequately catalogue the majority of taxa in asample is too large to be efficiently or economically handled. As such,an improved array and method is needed to efficiently analyze aplurality of organisms without the disadvantages of the abovetechnologies.

SUMMARY OF THE INVENTION

Some embodiments relate to an array system including a microarrayconfigured to simultaneously detect a plurality of organisms in asample, wherein the microarray comprises fragments of 16s RNA unique toeach organism and variants of said fragments comprising at least 1nucleotide mismatch, wherein the level of confidence of species-specificdetection derived from fragment matches is about 90% or higher.

In one aspect, the plurality of organisms comprise bacteria or archaea.

In another aspect, the fragments of 16s RNA are clustered and alignedinto groups of similar sequence such that detection of an organism basedon at least 1 fragment matches is possible.

In yet another aspect, the level of confidence of species-specificdetection derived from fragment matches is about 95% or higher.

In still another aspect, the level of confidence of species-specificdetection derived from fragment matches is about 98% or higher.

In some embodiments, the majority of fragments of 16s RNA unique to eachorganism have a corresponding variant fragment comprising at least 1nucleotide mismatch.

In some aspects, every fragment of 16s RNA unique to each organism has acorresponding variant fragment comprising at least 1 nucleotidemismatch.

In other aspects, the fragments are about 25 nucleotides long.

In some aspects, the sample is an environmental sample.

In other aspects, the environmental sample comprises at least one ofsoil, water or atmosphere.

In yet other aspects, the sample is a clinical sample.

In still other aspects, the clinical sample comprises at least one oftissue, skin, bodily fluid or blood.

Some embodiments relate to a method of detecting an organism includingapplying a sample comprising a plurality of organisms to the arraysystem which includes a microarray that comprises fragments of 16s RNAunique to each organism and variants of said fragments comprising atleast 1 nucleotide mismatch, wherein the level of confidence ofspecies-specific detection derived from fragment matches is about 90% orhigher; and identifying organisms in the sample.

In some aspects, the plurality of organisms comprise bacteria orarchaea.

In other aspects, the majority of fragments of 16s RNA unique to eachorganism have a corresponding variant fragment comprising at least 1nucleotide mismatch.

In still other aspects, every fragment of 16s RNA unique to eachorganism has a corresponding variant fragment comprising at least 1nucleotide mismatch.

In yet other aspects, the fragments are about 25 nucleotides long.

In some aspects, the organism to be detected is the most metabolicallyactive organism in the sample.

Some embodiments relate to a method of fabricating an array systemincluding identifying 16s RNA sequences corresponding to a plurality oforganisms of interest; selecting fragments of 16s RNA unique to eachorganism; creating variant RNA fragments corresponding to the fragmentsof 16s RNA unique to each organism which comprise at least 1 nucleotidemismatch; and fabricating said array system.

In some aspects, the plurality of organisms comprise bacteria orarchaea.

In other aspects, the majority of fragments of 16s RNA unique to eachorganism have a corresponding variant fragment comprising at least 1nucleotide mismatch.

In still other aspects, every fragment of 16s RNA unique to eachorganism has a corresponding variant fragment comprising at least 1nucleotide mismatch.

In yet other aspects, the fragments are about 25 nucleotides long.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a bar graph showing rank-abundance curve of phylotypes withinthe urban aerosol clone library obtained from San Antonio calendar week29. Phylotypes were determined by clustering at 99% homology usingnearest neighbor joining.

FIG. 2 is a line graph showing that Chao1 and ACE richness estimatorsare non-asymptotic, indicating an under-estimation of predicted richnessbased on numbers of clones sequenced.

FIG. 3 is a graph showing a Latin square assessment of 16S rRNA genesequence quantitation by microarray.

FIG. 4 is a graph showing comparison of real-time PCR and arraymonitoring of Pseudomonas oleovorans density in aerosol samples from SanAntonio. Corrected Array Hybridization Score is the 1n(intensity)normalized by internal spikes as described under Normalization.

DETAILED DESCRIPTION

The present embodiments are related to an array system for detecting andidentifying biomolecules and organisms. More specifically, the presentembodiments relate to an array system comprising a microarray configuredto simultaneously detect a plurality of organisms in a sample at a highconfidence level.

In some embodiments, the array system uses multiple probes forincreasing confidence of identification of a particular organism using a16S rRNA gene targeted high density microarray. The use of multipleprobes can greatly increase the confidence level of a match to aparticular organism. Also, in some embodiments, mismatch control probescorresponding to each perfect match probe can be used to furtherincrease confidence of sequence-specific hybridization of a target to aprobe. Probes with one or more mismatch can be used to indicatenon-specific binding and a possible non-match. This has the advantage ofreducing false positive results due to non-specific hybridization, whichis a significant problem with many current microarrays.

Some embodiments of the invention relate to a method of using an arrayto simultaneously identify multiple prokaryotic taxa with a relativelyhigh confidence. A taxa is an individual microbial species or group ofhighly related species that share an average of about 97% 16S rRNA genesequence identity. The array system of the current embodiments may usemultiple confirmatory probes, each with from about 1 to about 20corresponding mismatch control probes to target the most unique regionswithin a 16S rRNA gene for about 9000 taxa. Preferably, eachconfirmatory probe has from about 1 to about 10 corresponding mismatchprobes. More preferably, each confirmatory probe has from about 1 toabout 5 corresponding mismatch probes. The aforementioned about 9000taxa represent a majority of the taxa that are currently known through16S rRNA clone sequence libraries. In some embodiments, multiple targetscan be assayed through a high-density oligonucleotide array. The sum ofall target hybridizations is used to identify specific prokaryotic taxa.The result is a much more efficient and less time consuming way ofidentifying unknown organisms that in addition to providing results thatcould not previously be achieved, can also provide results in hours thatother methods would require days to achieve.

In some embodiments, the array system of the present embodiments can befabricated using 16s rRNA sequences as follows. From about 1 to about500 short probes can be designed for each taxonomic group. In someembodiments, the probes can be proteins, antibodies, tissue samples oroligonucleotide fragments. In certain examples, oligonucleotidefragments are used as probes. In some embodiments, from about 1 to about500 short oligonucleotide probes, preferably from about 2 to about 200short oligonucleotide probes, more preferably from about 5 to about 150short oligonucleotide probes, even more preferably from about 8 to about100 short oligonucleotide probes can be designed for each taxonomicgrouping, allowing for the failure of one or more probes. In oneexample, at least about 11 short oligonucleotide probes are used foreach taxonomic group. The oligonucleotide probes can each be from about5 bp to about 100 bp, preferably from about 10 bp to about 50 bp, morepreferably from about 15 bp to about 35 bp, even more preferably fromabout 20 bp to about 30 bp. In some embodiments, the probes may be5-mers, 6-mers, 7-mers, 8-mers, 9-mers, 10-mers, 11-mers, 12-mers,13-mers, 14-mers, 15-mers, 16-mers, 17-mers, 18-mers, 19-mers, 20-mers,21-mers, 22-mers, 23-mers, 24-mers, 25-mers, 26-mers, 27-mers, 28-mers,29-mers, 30-mers, 31-mers, 32-mers, 33-mers, 34-mers, 35-mers, 36-mers,37-mers, 38-mers, 39-mers, 40-mers, 41-mers, 42-mers, 43-mers, 44-mers,45-mers, 46-mers, 47-mers, 48-mers, 49-mers, 50-mers, 51-mers, 52-mers,53-mers, 54-mers, 55-mers, 56-mers, 57-mers, 58-mers, 59-mers, 60-mers,61-mers, 62-mers, 63-mers, 64-mers, 65-mers, 66-mers, 67-mers, 68-mers,69-mers, 70-mers, 71-mers, 72-mers, 73-mers, 74-mers, 75-mers, 76-mers,77-mers, 78-mers, 79-mers, 80-mers, 81-mers, 82-mers, 83-mers, 84-mers,85-mers, 86-mers, 87-mers, 88-mers, 89-mers, 90-mers, 91-mers, 92-mers,93-mers, 94-mers, 95-mers, 96-mers, 97-mers, 98-mers, 99-mers, 100-mersor combinations thereof.

Non-specific cross hybridization can be an issue when an abundant 16SrRNA gene shares sufficient sequence similarity to non-targeted probes,such that a weak but detectable signal is obtained. The use of sets ofperfect match and mismatch probes (PM-MM) effectively minimizes theinfluence of cross-hybridization. In certain embodiments, each perfectmatch probe (PM) has one corresponding mismatch probe (MM) to form apair that are useful for analyzing a particular 16S rRNA sequence. Inother embodiments, each PM has more than one corresponding MM.Additionally, different PMs can have different numbers of correspondingMM probes. In some embodiments, each PM has from about 1 to about 20 MM,preferably, each PM has from about 1 to about 10 MM and more preferably,each PM has from about 1 to about 5 MM.

Any of the nucleotide bases can be replaced in the MM probe to result ina probe having a mismatch. In one example, the central nucleotide basesequence can be replaced with any of the three non-matching bases. Inother examples, more than one nucleotide base in the MM is replaced witha non-matching base. In some examples, 10 nucleotides are replaced inthe MM, in other examples, 5 nucleotides are replaced in the MM, in yetother examples 3 nucleotides are replaced in the MM, and in still otherexamples, 2 nucleotides are replaced in the MM. This is done so that theincreased hybridization intensity signal of the PM over the one or moreMM indicates a sequence-specific positive hybridization. By requiringmultiple PM-MM probes to have a confirmation interaction, the chancethat the hybridization signal is due to a predicted target sequence issubstantially increased.

In other embodiments, the 16S rRNA gene sequences can be grouped intodistinct taxa such that a set of the short oligonucleotide probes thatare specific to the taxon can be chosen. In some examples, the 16s rRNAgene sequences grouped into distinct taxa are from about 100 bp to about1000 bp, preferably the gene sequences are from about 400 bp to about900 bp, more preferably from about 500 bp to about 800 bp. The resultingabout 9000 taxa represented on the array, each containing from about 1%to about 5% sequence divergence, preferably about 3% sequencedivergence, can represent substantially all demarcated bacterial andarchaeal orders.

In some embodiments, for a majority of the taxa represented on thearray, probes can be designed from regions of gene sequences that haveonly been identified within a given taxon. In other embodiments, sometaxa have no probe-level sequence that can be identified that is notshared with other groups of 16S rRNA gene sequences. For these taxonomicgroupings, a set of from about 1 to about 500 short oligonucleotideprobes, preferably from about 2 to about 200 short oligonucleotideprobes, more preferably from about 5 to about 150 short oligonucleotideprobes, even more preferably from about 8 to about 100 shortoligonucleotide probes can be designed to a combination of regions onthe 16S rRNA gene that taken together as a whole do not exist in anyother taxa. For the remaining taxa, a set of probes can be selected tominimize the number of putative cross-reactive taxa. For all three probeset groupings, the advantage of the hybridization approach is thatmultiple taxa can be identified simultaneously by targeting uniqueregions or combinations of sequence.

In some embodiments, oligonucleotide probes can then be selected toobtain an effective set of probes capable of correctly identifying thesample of interest. In certain embodiments, the probes are chosen basedon various taxonomic organizations useful in the identification ofparticular sets of organisms.

In some embodiments, the chosen oligonucleotide probes can then besynthesized by any available method in the art. Some examples ofsuitable methods include printing with fine-pointed pins onto glassslides, photolithography using pre-made masks, photolithography usingdynamic micromirror devices, ink-jet printing or electrochemistry. Inone example, a photolithographic method can be used to directlysynthesize the chosen oligonucleotide probes onto a surface. Suitableexamples for the surface include glass, plastic, silicon and any othersurface available in the art. In certain examples, the oligonucleotideprobes can be synthesized on a glass surface at an approximate densityof from about 1,000 probes per μm² to about 100,000 probes per μm²,preferably from about 2000 probes per μm² to about 50,000 probes perμm², more preferably from about 5000 probes per μm² to about 20,000probes per μm². In one example, the density of the probes is about10,000 probes per μm². The array can then be arranged in anyconfiguration, such as, for example, a square grid of rows and columns.Some areas of the array can be oligonucleotide 16S rDNA PM or MM probes,and others can be used for image orientation, normalization controls orother analyses. In some embodiments, materials for fabricating the arraycan be obtained from Affymetrix, GE Healthcare (Little Chalfont,Buckinghamshire, United Kingdom) or Agilent Technologies (Palo Alto,Calif.)

In some embodiments, the array system is configured to have controls.Some examples of such controls include 1) probes that target ampliconsof prokaryotic metabolic genes spiked into the 16S rDNA amplicon mix indefined quantities just prior to fragmentation and 2) probescomplimentary to a pre-labeled oligonucleotide added into thehybridization mix. The first control collectively tests thefragmentation, biotinylation, hybridization, staining and scanningefficiency of the array system. It also allows the overall fluorescentintensity to be normalized across all the arrays in an experiment. Thesecond control directly assays the hybridization, staining and scanningof the array system. However, the array system of the presentembodiments is not limited to these particular examples of possiblecontrols.

The accuracy of the array of some embodiments has been validated bycomparing the results of some arrays with 16S rRNA gene sequences fromapproximately 700 clones in each of 3 samples. A specific taxa isidentified as being present in a sample if a majority (from about 70% toabout 100%, preferably from about 80% to about 100% and more preferablyfrom about 90% to about 100%) of the probes on the array have ahybridization signal about 100 times, 200 times, 300 times, 400 times or500 times greater than that of the background and the perfect matchprobe has a significantly greater hybridization signal than its one ormore partner mismatch control probe or probes. This ensures a higherprobability of a sequence specific hybridization to the probe. In someembodiments, the use of multiple probes, each independently indicatingthat the target sequence of the taxonomic group being identified ispresent, increases the probability of a correct identification of theorganism of interest.

Biomolecules, such proteins, DNA, RNA, DNA from amplified products andnative rRNA from the 16S rRNA gene, for example can be probed by thearray of the present embodiments. In some embodiments, probes aredesigned to be antisense to the native rRNA so that directly labeledrRNA from samples can be placed directly on the array to identify amajority of the actively metabolizing organisms in a sample with no biasfrom PCR amplification. Actively metabolizing organisms havesignificantly higher numbers of ribosomes used for the production ofproteins, therefore, in some embodiments, the capacity to make proteinsat a particular point in time of a certain organism can be measured.This is not possible in systems where only the 16S rRNA gene DNA ismeasured which encodes only the potential to make proteins and is thesame whether an organism is actively metabolizing or quiescent or dead.In this way, the array system of the present embodiments can directlyidentify the metabolizing organisms within diverse communities.

In some embodiments, the array system is able to measure the microbialdiversity of complex communities without PCR amplification, andconsequently, without all of the inherent biases associated with PCRamplification. Actively metabolizing cells typically have about 20,000or more ribosomal copies within their cell for protein assembly comparedto quiescent or dead cells that have few. In some embodiments, rRNA canbe purified directly from environmental samples and processed with noamplification step, thereby avoiding any of the biases caused by thepreferential amplification of some sequences over others. Thus, in someembodiments the signal from the array system can reflect the true numberof rRNA molecules that are present in the samples, which can beexpressed as the number of cells multiplied by the number of rRNA copieswithin each cell. The number of cells in a sample can then be inferredby several different methods, such as, for example, quantitativereal-time PCR, or FISH (fluorescence in situ hybridization.) Then theaverage number of ribosomes within each cell may be calculated.

In some embodiments, the samples used can be environmental samples fromany environmental source, for example, naturally occurring or artificialatmosphere, water systems, soil or any other sample of interest. In someembodiments, the environmental samples may be obtained from, forexample, atmospheric pathogen collection systems, sub-surface sediments,groundwater, ancient water deep within the ground, plant root-soilinterface of grassland, coastal water and sewage treatment plants.Because of the ability of the array system to simultaneously test forsuch a broad range of organisms based on almost all known 16s rRNA genesequences, the array system of the present embodiments can be used inany environment, which also distinguishes it from other array systemswhich generally must be targeted to specific environments.

In other embodiments, the sample used with the array system can be anykind of clinical or medical sample. For example, samples from blood, thelungs or the gut of mammals may be assayed using the array system. Also,the array system of the present embodiments can be used to identify aninfection in the blood of an animal. The array system of the presentembodiments can also be used to assay medical samples that are directlyor indirectly exposed to the outside of the body, such as the lungs,ear, nose, throat, the entirety of the digestive system or the skin ofan animal. Hospitals currently lack the resources to identify thecomplex microbial communities that reside in these areas.

Another advantage of the present embodiments is that simultaneousdetection of a majority of currently known organisms is possible withone sample. This allows for much more efficient study and determinationof particular organisms within a particular sample. Current microarraysdo not have this capability. Also, with the array system of the presentembodiments, simultaneous detection of the top metabolizing organismswithin a sample can be determined without bias from PCR amplification,greatly increasing the efficiency and accuracy of the detection process.

Some embodiments relate to methods of detecting an organism in a sampleusing the described array system. These methods include contacting asample with one organism or a plurality of organisms to the array systemof the present embodiments and detecting the organism or organisms. Insome embodiments, the organism or organisms to be detected are bacteriaor archaea. In some embodiments, the organism or organisms to bedetected are the most metabolically active organism or organisms in thesample.

Some embodiments relate to a method of fabricating an array systemincluding identifying 16s RNA sequences corresponding to a plurality oforganisms of interest, selecting fragments of 16s RNA unique to eachorganism and creating variant RNA fragments corresponding to thefragments of 16s RNA unique to each organism which comprise at least 1nucleotide mismatch and then fabricating the array system.

The following examples are provided for illustrative purposes only, andare in no way intended to limit the scope of the present invention.

Example 1

An array system was fabricated using 16s rRNA sequences taken from aplurality of bacterial species. A minimum of 11 different, shortoligonucleotide probes were designed for each taxonomic grouping,allowing one or more probes to not bind, but still give a positivesignal in the assay. Non-specific cross hybridization is an issue whenan abundant 16S rRNA gene shares sufficient sequence similarity tonon-targeted probes, such that a weak but detectable signal is obtained.The use of a perfect match-mismatch (PM-MM) probe pair effectivelyminimized the influence of cross-hybridization. In this technique, thecentral nucleotide is replaced with any of the three non-matching basesso that the increased hybridization intensity signal of the PM over thepaired MM indicates a sequence-specific, positive hybridization. Byrequiring multiple PM-MM probe-pairs to have a positive interaction, thechance that the hybridization signal is due to a predicted targetsequence is substantially increased.

The known 16S rRNA gene sequences larger than 600 bp were grouped intodistinct taxa such that a set of at least 11 probes that were specificto each taxon could be selected. The resulting 8,935 taxa (8,741 ofwhich are represented on the array), each containing approximately 3%sequence divergence, represented all 121 demarcated bacterial andarchacal orders. For a majority of the taxa represented on the array(5,737, 65%), probes were designed from regions of 16S rRNA genesequences that have only been identified within a given taxon. For 1,198taxa (14%) no probe-level sequence could be identified that was notshared with other groups of 16S rRNA gene sequences, although the genesequence as a whole was distinctive. For these taxonomic groupings, aset of at least 11 probes was designed to a combination of regions onthe 16S rRNA gene that taken together as a whole did not exist in anyother taxa. For the remaining 1,806 taxa (21%), a set of probes wereselected to minimize the number of putative cross-reactive taxa.Although more than half of the probes in this group have a hybridizationpotential to one outside sequence, this sequence was typically from aphylogenetically similar taxon. For all three probe set groupings, theadvantage of the hybridization approach is that multiple taxa can beidentified simultaneously by targeting unique regions or combinations ofsequence.

Example 2

An array system was fabricated according to the following protocol. 16SrDNA sequences (Escherichia coli base pair positions 47 to 1473) wereobtained from over 30,000 16S rDNA sequences that were at least 600nucleotides in length in the 15 Mar. 2002 release of the 16S rDNAdatabase, “Greengenes.” This region was selected because it is boundedon both ends by universally conserved segments that can be used as PCRpriming sites to amplify bacterial or archaeal genomic material usingonly 2 to 4 primers. Putative chimeric sequences were filtered from thedata set using computer software preventing them from being misconstruedas novel organisms. The filtered sequences are considered to be the setof putative 16S rDNA amplicons. Sequences were clustered to enable eachsequence of a cluster to be complementary to a set of perfectly matching(PM) probes. Putative amplicons were placed in the same cluster as aresult of common 17-mers found in the sequence.

The resulting 8,988 clusters, each containing approximately 3% sequencedivergence, were considered operational taxonomic units (OTUs)representing all 121 demarcated prokaryotic orders. The taxonomic familyof each OTU was assigned according to the placement of its memberorganisms in Bergey's Taxonomic Outline. The taxonomic outline asmaintained by Philip Hugenholtz was consulted for phylogenetic classescontaining uncultured environmental organisms or unclassified familiesbelonging to named higher taxa. The OTUs comprising each family wereclustered into sub-families by transitive sequence identity. Altogether,842 sub-families were found. The taxonomic position of each OTU as wellas the accompanying NCBI accession numbers of the sequences composingeach OTU are recorded and publicly available.

The objective of the probe selection strategy was to obtain an effectiveset of probes capable of correctly categorizing mixed amplicons intotheir proper OTU. For each OTU, a set of 11 or more specific 25-mers(probes) were sought that were prevalent in members of a given OTU butwere dissimilar from sequences outside the given OTU. In the first stepof probe selection for a particular OTU, each of the sequences in theOTU was separated into overlapping 25-mers, the potential targets. Theneach potential target was matched to as many sequences of the OTU aspossible. First, a text pattern was used for a search to match potentialtargets and sequences, however, since partial gene sequences wereincluded in the reference set additional methods were performed.Therefore, the multiple sequence alignment provided by Greengenes wasused to provide a discrete measurement of group size at each potentialprobe site. For example, if an OTU containing seven sequences possesseda probe site where one member was missing data, then the site-specificOTU size was only six.

In ranking the possible targets, those having data for all members ofthat OTU were preferred over those found only in a fraction of the OTUmembers. In the second step, a subset of the prevalent targets wasselected and reverse complimented into probe orientation, avoiding thosecapable of mis-hybridization to an unintended amplicon. Probes presumedto have the capacity to mis-hybridize were those 25-mers that containeda central 17-mer matching sequences in more than one OTU. Thus, probesthat were unique to an OTU solely due to a distinctive base in one ofthe outer four bases were avoided. Also, probes with mis-hybridizationpotential to sequences having a common tree node near the root werefavored over those with a common node near the terminal branch.

Probes complementary to target sequences that were selected forfabrication were termed perfectly matching (PM) probes. As each PM probewas chosen, it was paired with a control 25-mer (mismatching probe, MM),identical in all positions except the thirteenth base. The MM probe didnot contain a central 17-mer complimentary to sequences in any OTU. Theprobe complementing the target (PM) and MM probes constitute a probepair analyzed together.

The chosen oligonucleotides were synthesized by a photolithographicmethod at Affymetrix Inc. (Santa Clara, Calif., USA) directly onto a1.28 cm by 1.28 cm glass surface at an approximate density of 10,000probes per μm². Each unique probe sequence on the array had a copynumber of roughly 3.2×10⁶ (personal communication, Affymetrix). Theentire array of 506,944 features was arranged as a square grid of 712rows and columns. Of these features, 297,851 were oligonucleotide 16SrDNA PM or MM probes, and the remaining were used for image orientation,normalization controls or other unrelated analyses. Each DNA chip hadtwo kinds of controls on it: 1) probes that target amplicons ofprokaryotic metabolic genes spiked into the 16S rDNA amplicon mix indefined quantities just prior to fragmentation and 2) probescomplimentary to a pre-labeled oligonucleotide added into thehybridization mix. The first control collectively tested thefragmentation, biotinylation, hybridization, staining and scanningefficiency. It also allowed the overall fluorescent intensity to benormalized across all the arrays in an experiment. The second controldirectly assayed the hybridization, staining and scanning.

Example 3

A study was done on diverse and dynamic bacterial population in urbanaerosols utilizing an array system of certain embodiments. Air sampleswere collected using an air filtration collection system under vacuumlocated within six EPA air quality network sites in both San Antonio andAustin, Tex. Approximately 10 liters of air per minute were collected ina polyethylene terephthalate (Celanex), 1.0 μm filter (HoechstCalanese). Samples were collected daily over a 24 h period. Samplefilters were washed in 10 mL buffer (0.1 M Sodium Phosphate, 10 mM EDTA,pH 7.4, 0.01% Tween-20), and the suspension was stored frozen untilextracted. Samples were collected from 4 May to 29 Aug. 2003.

Sample dates were divided according to a 52-week calendar year startingJan. 1, 2003, with each Monday to Sunday cycle constituting a full week.Samples from four randomly chosen days within each sample week wereextracted. Each date chosen for extraction consisted of 0.6 mL filterwash from each of the six sampling sites for that city (San Antonio orAustin) combined into a “day pool” before extraction. In total, for eachweek, 24 filters were sampled.

The “day pools” were centrifuged at 16,000×g for 25 min and the pelletswere resuspended in 400 μL sodium phosphate buffer (100 mM, pH 8). Theresuspended pellets were transferred into 2 mL silica bead lysis tubescontaining 0.9 g of silica/zirconia lysis bead mix (0.3 g of 0.5 mmzirconia/silica beads and 0.6 g of 0.1 mm zirconia/silica beads). Foreach lysis tube, 300 μL buffered sodium dodecyl sulfate (SDS) (100 mMsodium chloride, 500 mM Tris pH 8, 10% [w/v] SDS), and 300 μLphenol:chloroform:isoamyl alcohol (25:24:1) were added. Lysis tubes wereinverted and flicked three times to mix buffers before bead millhomogenization with a Bio101 Fast Prep 120 machine (Qbiogene, Carlsbad,Calif.) at 6.5 m s⁻¹ for 45 s. Following centrifugation at 16,000×g for5 min, the aqueous supernatant was removed to a new 2 mL tube and keptat −20° C. for 1 hour to overnight. An equal volume of chloroform wasadded to the thawed supernatant prior to vortexing for 5 s andcentrifugation at 16,000×g for 3 min. The supernatant was then combinedwith two volumes of a binding buffer “Solution 3” (UltraClean Soil DNAkit, MoBio Laboratories, Solana Beach, Calif.). Genomic DNA from themixture was isolated on a MoBio spin column, washed with “Solution 4”and eluted in 60 μL of 1× Tris-EDTA according to the manufacturer'sinstructions. The DNA was further purified by passage through aSephacryl S-200 HR spin column (Amersham, Piscataway, N.J., USA) andstored at 4° C. prior to PCR amplification. DNA was quantified using aPicoGreen fluorescence assay according to the manufacturer's recommendedprotocol (Invitrogen, Carlsbad, Calif.).

The 16S rRNA gene was amplified from the DNA extract using universalprimers 27F.1, (5′ AGRGTTTGATCMTGGCTCAG) (SEQ ID NO: 1) and 1492R, (5′GGTTACCTTGTTACGACTT) (SEQ ID NO: 2). Each PCR reaction mix contained 1×Ex Taq buffer (Takara Bio Inc, Japan), 0.8 mM dNTP mixture, 0.02 U/μL ExTaq polymerase, 0.4 mg/mL bovine serum albumin (BSA), and 1.0 μM eachprimer. PCR conditions were 1 cycle of 3 min at 95° C., followed by 35cycles of 30 see at 95° C., 30 sec at 53° C., and 1 min at 72° C., andfinishing with 7 min incubation at 72° C. When the total mass of PCRproduct for a sample week reached 2 μg (by gel quantification), all PCRreactions for that week were pooled and concentrated to a volume lessthan 40 μL with a Micron YM100 spin filter (Millipore, Billerica, Mass.)for microarray analysis.

The pooled PCR product was spiked with known concentrations of synthetic16S rRNA gene fragments and non-16S rRNA gene fragments according toTable S1. This mix was fragmented using DNAse I (0.02 U/μg DNA,Invitrogen, CA) and One-Phor-All buffer (Amersham, N.J.) perAffymetrix's protocol, with incubation at 25° C. for 10 min., followedby enzyme denaturation at 98° C. for 10 min. Biotin labeling wasperformed using an Enzo® BioArray™ Terminal Labeling Kit (Enzo LifeSciences Inc., Farmingdale, N.Y.) per the manufacturer's directions. Thelabeled DNA was then denatured (99° C. for 5 min) and hybridized to theDNA microarray at 48° C. overnight (>16 hr). The microarrays were washedand stained per the Affymetrix protocol.

The array was scanned using a GeneArray Scanner (Affymetrix, SantaClara, Calif., USA). The scan was recorded as a pixel image and analyzedusing standard Affymetrix software (Microarray Analysis Suite, version5.1) that reduced the data to an individual signal value for each probe.Background probes were identified as those producing intensities in thelowest 2% of all intensities. The average intensity of the backgroundprobes was subtracted from the fluorescence intensity of all probes. Thenoise value (N) was the variation in pixel intensity signals observed bythe scanner as it read the array surface. The standard deviation of thepixel intensities within each of the identified background cells wasdivided by the square root of the number of pixels comprising that cell.The average of the resulting quotients was used for N in thecalculations described below.

Probe pairs scored as positive were those that met two criteria: (i) theintensity of fluorescence from the perfectly matched probe (PM) wasgreater than 1.3 times the intensity from the mismatched control (MM),and (ii) the difference in intensity, PM minus MM, was at least 130times greater than the squared noise value (>130 N²). These two criteriawere chosen empirically to provide stringency while maintainingsensitivity to the amplicons known to be present from sequencing resultsof cloning the San Antonio week 29 sample. The positive fraction(PosFrac) was calculated for each probe set as the number of positiveprobe pairs divided by the total number of probe pairs in a probe set. Ataxon was considered present in the sample when over 92% of its assignedprobe pairs for its corresponding probe set were positive(PosFrac>0.92). This was determined based on empirical data from clonelibrary analyses. Hybridization intensity (hereafter referred to asintensity) was calculated in arbitrary units (a.u.) for each probe setas the trimmed average (maximum and minimum values removed beforeaveraging) of the PM minus MM intensity differences across the probepairs in a given probe set. All intensities <1 were shifted to 1 toavoid errors in subsequent logarithmic transformations. When summarizingchip results to the sub-family, the probe set producing the highestintensity was used.

To compare the diversity of bacteria detected with microarrays to aknown standard, one sample week was chosen for cloning and sequencingand for replicate microarray analysis. One large pool of SSU amplicons(96 reactions, 50 μL/reaction) from San Antonio week 29 was made. Onemilliliter of the pooled PCR product was gel purified and 768 cloneswere sequenced at the DOE Joint Genome Institute (Walnut Creek, Calif.)by standard methods. An aliquot of this pooled PCR product was alsohybridized to a microarray (three replicate arrays performed).Sub-families containing a taxon scored as present in all three arrayreplicates were recorded. Individual cloned rRNA genes were sequencedfrom each terminus, assembled using Phred and Phrap (S9, S10, S11), andwere required to pass quality tests of Phred 20 (base call errorprobability<10^(−2.0)) to be included in the comparison.

Sequences that appeared chimeric were removed using Bellerophon (S2)with two requirements; (1) the preference score must be less than 1.3and (2) the divergence ratio must be less than 1.1. The divergence ratiois a new metric implemented to weight the likelihood of a sequence beingchimeric according to the similarity of the parent sequences. The moredistantly related the parent sequences are to each other relative totheir divergence from the chimeric sequence, the greater the likelihoodthat the inferred chimera is real. This metric uses the average sequenceidentity between the two fragments of the candidate and theircorresponding parent sequences as the numerator, and the sequenceidentity between the parent sequences as the denominator. Allcalculations are made using a 300 base pair window on either side of themost likely break point. A divergence ratio of 1.1 was empiricallydetermined to be the threshold for classifying sequences as putativelychimeric.

Similarity of clones to array taxa was calculated with DNADIST (S12)using the DNAML-F84 option assuming a transition:transversion ratio of2.0 and an A, C, G, T 16S rRNA gene base frequency of 0.2537, 0.2317,0.3167, 0.1979, respectively. We calculated these parameters empiricallyfrom all records of the ‘Greengenes’ 16S rRNA multiple sequencealignment over 1,250 nucleotides in length. The Lane mask (S13) was usedto restrict similarity observations to 1,287 conserved columns (lanes)of aligned characters. Cloned sequences from this study were rejectedfrom further analysis when <1,000 characters could be compared to alane-masked reference sequence. Sequences were assigned to a taxonomicnode using a sliding scale of similarity threshold (S14). Phylum, class,order, family, sub-family, or taxon placement was accepted when a clonesurpassed similarity thresholds of 80%, 85%, 90%, 92%, 94%, or 97%,respectively. When similarity to nearest database sequence was <94%, theclone was considered to represent a novel sub-family. A full comparisonbetween clone and array analysis is presented in Table S2.

Primers targeting sequences within particular taxa/sub-families weregenerated by ARB's probe design feature (S15). Melting temperatures wereconstrained from 45° C. to 65° C. with G+C content between 40 and 70%.The probes were chosen to contain 3′ bases non-complementary tosequences outside of the taxon/sub-family. Primers were matched usingPrimer3 (S16) to create primer pairs (Table S3). Sequences weregenerated using the Takara enzyme system as described above with thenecessary adjustments in annealing temperatures. Amplicons were purified(PureLink PCR Purification Kit, Invitrogen) and sequenced directly or,if there were multiple unresolved sequences, cloned using a TOPO pCR2.1cloning kit (Invitrogen, CA) according to the manufacturer'sinstructions. The M13 primer pair was used for clones to generate insertamplicons for sequencing at UC Berkeley's sequencing facility.

To determine whether changes in 16S rRNA gene concentration could bedetected using the array, various quantities of distinct rRNA gene typeswere hybridized to the array in rotating combinations. We choseenvironmental organisms, organisms involved in bioremediation, and apathogen of biodefense relevance. 16S rRNA genes were amplified fromeach of the organisms in Table S4. Then each of these nine distinct 16SrRNA gene standards was tested once in each concentration categoryspanning 5 orders of magnitude (0 molecules, 6×10⁷, 1.44×10⁸, 3.46×10⁸,8.30×10⁸, 1.99×10⁹, 4.78×10⁹, 2.75×10¹⁰, 6.61×10¹⁰, 1.59×10¹¹) withconcentrations of individual 16S rRNA gene types rotating between arrayssuch that each array contained the same total of 16S rRNA genemolecules. This is similar to a Latin Square design, although with a9×11 format matrix.

A taxon (#9389) consisting only of two sequences of Pseudomonasoleovorans that correlated well with environmental variables was chosenfor quantitative PCR confirmation of array observed quantitative shifts.Primers for this taxon were designed using the ARB (S15) probe matchfunction to determine unique priming sites based upon regions detectedby array probes. These regions were then imputed into Primer3 (S16) inorder to choose optimal oligonucleotide primers for PCR. Primer qualitywas further assessed using Beacon Designer v3.0 (Premier BioSoft,Calif.). Primers 9389F2 (CGACTACCTGGACTGACACT) (SEQ ID NO: 3) and 9389R2(CACCGGCAGTCTCCTTAGAG) (SEQ ID NO: 4) were chosen to amplify a 436 bpfragment.

To test the specificity of this primer pair, we used a nested PCRapproach. 16S rRNA genes were amplified using universal primers (27F,1492R) from pooled aerosol genomic DNA extracts from both Austin and SanAntonio, Tex. These products were purified and used as template in PCRreactions using primer set 9389F2-9389R2. Amplicons were then ligated topCR2.1 and transformed into E. coli TOP10 cells as recommended by themanufacturer (Invitrogen, CA). Five clones were chosen at random foreach of the two cities (10 clones total) and inserts were amplifiedusing vector specific primers M13 forward and reverse. Standard Sangersequencing was performed and sequences were tested for homology againstexisting database entries (NCBI GenBank, RDPII and Greengenes).

To assay P. oleovorans 16S rRNA gene copies in genomic DNA extracts, weperformed real-time quantitative PCR (qPCR) using an iCycler iQreal-time detection system (BioRad, Calif.) with the iQ Sybr® GreenSupermix (BioRad, Calif.) kit. Reaction mixtures (final volume, 25 μl)contained 1×1iQ Sybr® Green Supermix, 7.5 pmol of each primer, 25 ugBSA, 0.5 μl DNA extract and DNase/RNase free water. Following enzymeactivation (95° C., 3 min), up to 50 cycles of 95° C., 30 s; 61° C., 30s; 85° C., 10 s and 72° C., 45 s were performed. The specific dataacquisition step (85° C. for 10 s) was set above the Tm of potentialprimer dimers and below the Tm of the product to minimize anynon-amplicon Sybr Green fluorescence. Copy number of P. oleovorans 16SrRNA gene molecules was quantified by comparing cycle thresholds to astandard curve (in the range of 7.6×10⁰ to 7.6×10⁵ copies μl⁻¹), run inparallel, using cloned P. oleovorans 16S rRNA amplicons generated by PCRusing primers M13 forward and reverse. Regression coefficients for thestandard curves were typically greater than 0.99, and post amplificationmelt curve analyses displayed a single peak at 87.5° C., indicative ofspecific Pseudomonas oleovorans 16S rRNA gene amplification (data notshown).

To account for scanning intensity variation from array to array,internal standards were added to each experiment. The internal standardswere a set of thirteen amplicons generated from yeast and bacterialmetabolic genes and five synthetic 16S rRNA-like genes spiked into eachaerosol amplicon pool prior to fragmentation. The known concentrationsof the amplicons ranged from 4 pM to 605 pM in the final hybridizationmix. The intensities resulting from the fifteen corresponding probe setswere natural log transformed. Adjustment factors for each array werecalculated by fitting the linear model using the least-squares method.An array's adjustment factor was subtracted from each probe set's1n(intensity).

For each day of aerosol sampling, 15 factors including humidity, wind,temperature, precipitation, pressure, particulate matter, and week ofyear were recorded from the U.S. National Climatic Data Center(http://www.ncdc.noaa.gov) or the Texas Natural Resource ConservationCommission (http://www.tceq.state.tx.us). The weekly mean, minimum,maximum, and range of values were calculated for each factor from thecollected data. The changes in 1n(intensity) for each taxon consideredpresent in the study was tested for correlation against theenvironmental conditions. The resulting p-values were adjusted using thestep-up False Discovery Rate (FDR) controlling procedure (S18).

Multivariate regression tree analysis (S19, S20) was carried out usingthe package ‘mvpart’ within the ‘R’ statistical programming environment.A Bray-Curtis-based distance matrix was created using the function‘gdist’. The Brady-Curtis measure of dissimilarity is generally regardedas a good measure of ecological distance when dealing with ‘species’abundance as it allows for non-linear responses to environmentalgradients (S19, S21).

Prior to rarefaction analysis a distance matrix (DNAML homology) ofclone sequences was created using an online tool athttp://greengenes.lbl.gov/cgi-bin/nph-distance_matrix.cgi followingalignment of the sequences using the NAST aligner(http://greengenes.lbl.gov/NAST) (S22). DOTUR(S23) was used to generaterarefaction curves, Chaot and ACE richness predictions andrank-abundance curves. Nearest neighbor joining was used with 1000iterations for bootstrapping.

DNA yields in the pooled weekly filter washes ranged from 0.522 ng to154 ng. As only an aliquot of the filter washes was extracted weextrapolate the range of DNA extractable from each daily filter to bebetween 150 ng and 4300 ng assuming 10% extraction efficiency. Usingprevious estimates of bacterial to fungal ratios in aerosols (49%bacterial, 44% fungal clones; S24) this range is equivalent to 1.2×10⁷to 3.5×10⁸ bacterial cells per filter assuming a mean DNA content of abacterial cell of 6 fg (S25).

TABLE S1 Spike in-controls of functional genes and synthetic 16SrRNA-like genes used for internal array normalization. Molecules appliedDescription Affymetrix control spikes AFFX-BioB-5_at 5.83 × 10¹⁰ E. colibiotin synthetase AFFX-BioB-M_at 5.43 × 10¹⁰ E. coli biotin synthetaseAFFX-BioC-5_at 2.26 × 10¹⁰ E. coli bioC protein AFFX-BioC-3_at 1.26 ×10¹⁰ E. coli bioC protein AFFX-BioDn-3_at 1.68 × 10¹⁰ E. colidethiobiotin synthetase AFFX-CreX-5_at 2.17 × 10⁹ Bacteriophage P1 crerecombinase protein AFFX-DapX-5_at 9.03 × 10⁸ B. subtilis dapB,dihydrodipicolinate reductase AFFX-DapX-M_at 3.03 × 10¹⁰ B. subtilisdapB, dihydrodipicolinate reductase YFL039C 5.02 × 10⁸ Saccharomyces,Gene for actin (Act1p) protein YER022W 1.21 × 10⁹ Saccharomyces, RNApolymerase II mediator complex subunit (SRB4p) YER148W 2.91 × 10⁹Saccharomyces, TATA-binding protein, general transcription factor(SPT15) YEL002C 7.00 × 10⁹ Saccharomyces, Beta subunit of theoligosaccharyl transferase (OST) glycoprotein complex (WBP1) YEL024W7.29 × 10¹⁰ Saccharomyces, Ubiquinol-cytochrome-c reductase (RIP1)Synthetic 16S rRNA control spikes SYNM.neurolyt_st 6.74 × 10⁸ Syntheticderivative of Mycoplasma neurolyticum 16S rRNA gene SYNLc.oenos_st 3.90× 10⁹ Synthetic derivative of Leuconostoc oenos 16S rRNA geneSYNCau.cres8_st 9.38 × 10⁹ Synthetic derivative of Caulobactercrescentus 16S rRNA gene SYNFer.nodosm_st 4.05 × 10¹⁰ Syntheticderivative of Fervidobacterium nodosum 16S rRNA gene SYNSap.grandi_st1.62 × 10⁹ Synthetic derivative of Saprospira grandis 16S rRNA gene

TABLE S2 Comparison between clone and array results. Clone detectionDNAML similarity number Array of Comparison detection¹ clones Array 3/3assigned Chimera checking⁴ Array and Cloning replicates to maximummaximum only Cloning only pass = 1, sub- maximum preference divergencepass = 1, pass = 1, pass = 1, Sub-families fail = 0 family² similarity³score⁵ ratio⁶ fail = 0 fail = 0 fail = 0 Bacteria; AD3; Unclassified;Unclassified; 1 1 0 0 Unclassified; sf_1 Bacteria; Acidobacteria;Acidobacteria-10; 1 1 0 0 Unclassified; Unclassified; sf_1 Bacteria;Acidobacteria; Acidobacteria-4; 1 1 0 0 Ellin6075/11-25; Unclassified;sf_1 Bacteria; Acidobacteria; Acidobacteria-6; 1 1 0 0 Unclassified;Unclassified; sf_1 Bacteria; Acidobacteria; Acidobacteria; 1 3 0.9731.16 1.06 0 1 0 Acidobacteriales; Acidobacteriaceae; sf_14 Bacteria;Acidobacteria; Acidobacteria; 1 1 0 0 Acidobacteriales;Acidobacteriaceae; sf_16 Bacteria; Acidobacteria; Solibacteres; 1 20.960 0.00 0.00 0 1 0 Unclassified; Unclassified; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0 0 Acidimicrobiales;Acidimicrobiaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 1 00 Acidimicrobiales; Microthrixineae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 0 1 0.947 0.00 0.00 0 0 1 Acidimicrobiales;Microthrixineae; sf_12 Bacteria; Actinobacteria; Actinobacteria; 1 10.961 1.28 1.06 0 1 0 Acidimicrobiales; Unclassified; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0.947 0.00 0.00 0 1 0Actinomycetales; Acidothermaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Actinomycetaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales;Actinosynnemataceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 40.998 0.00 0.00 0 1 0 Actinomycetales; Brevibacteriaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 2 0.981 1.20 1.08 0 1 0Actinomycetales; Cellulomonadaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Corynebacteriaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 2 0.999 1.21 1.03 0 1 0Actinomycetales; Dermabacteraceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Dermatophilaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales;Dietziaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0Actinomycetales; Frankiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 2 1.000 0.00 0.00 0 1 0 Actinomycetales;Geodermatophilaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 10 0 Actinomycetales; Gordoniaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 10 0.999 1.20 1.18 0 1 0 Actinomycetales;Intrasporangiaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 1 00 Actinomycetales; Kineosporiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 4 0.999 0.00 0.00 0 1 0 Actinomycetales;Microbacteriaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 20.985 1.26 1.15 0 1 0 Actinomycetales; Micrococcaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 3 1.000 1.27 1.20 0 1 0Actinomycetales; Micromonosporaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Mycobacteriaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 1 0.999 0.00 0.00 0 1 0Actinomycetales; Nocardiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 4 0.994 1.16 1.07 0 1 0 Actinomycetales;Nocardioidaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 11.000 0.00 0.00 0 1 0 Actinomycetales; Nocardiopsaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales;Promicromonosporaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 13 0.982 1.20 1.05 0 1 0 Actinomycetales; Propionibacteriaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 3 0.999 1.14 1.11 0 1 0Actinomycetales; Pseudonocardiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Sporichthyaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 3 0.998 1.30 1.14 0 1 0Actinomycetales; Streptomycetaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 2 0.996 0.00 0.00 0 1 0 Actinomycetales;Streptomycetaceae; sf_3 Bacteria; Actinobacteria; Actinobacteria; 1 1 00 Actinomycetales; Streptosporangiaceae; sf_1 Bacteria; Actinobacteria;Actinobacteria; 1 1 0 0 Actinomycetales; Thermomonosporaceae; sf_1Bacteria; Actinobacteria; Actinobacteria; 1 1 0 0 Actinomycetales;Unclassified; sf_3 Bacteria; Actinobacteria; Actinobacteria; 0 1 0.9871.18 1.12 0 0 1 Actinomycetales; Williamsiaceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0 0 Bifidobacteriales;Bifidobacteriaceae; sf_1 Bacteria; Actinobacteria; Actinobacteria; 1 130.990 1.56 1.05 0 1 0 Rubrobacterales; Rubrobacteraceae; sf_1 Bacteria;Actinobacteria; Actinobacteria; 1 1 0 0 Unclassified; Unclassified; sf_1Bacteria; Aquificae; Aquificae; Aquificales; 1 1 0 0Hydrogenothermaceae; sf_1 Bacteria; BRC1; Unclassified; Unclassified; 11 0 0 Unclassified; sf_2 Bacteria; Bacteroidetes; Bacteroidetes; 1 1 0 0Bacteroidales; Porphyromonadaceae; sf_1 Bacteria; Bacteroidetes;Bacteroidetes; 1 1 0 0 Bacteroidales; Prevotellaceae; sf_1 Bacteria;Bacteroidetes; Bacteroidetes; 1 1 0 0 Bacteroidales; Rikenellaceae; sf_5Bacteria; Bacteroidetes; Bacteroidetes; 1 1 0 0 Bacteroidales;Unclassified; sf_15 Bacteria; Bacteroidetes; Flavobacteria; 1 1 0.9430.00 0.00 0 1 0 Flavobacteriales; Blattabacteriaceae; sf_1 Bacteria;Bacteroidetes; Flavobacteria; 1 1 0 0 Flavobacteriales;Flavobacteriaceae; sf_1 Bacteria; Bacteroidetes; Flavobacteria; 1 1 0 0Flavobacteriales; Unclassified; sf_3 Bacteria; Bacteroidetes; KSA1;Unclassified; 1 1 0 0 Unclassified; sf_1 Bacteria; Bacteroidetes;Sphingobacteria; 1 6 0.973 1.22 1.07 0 1 0 Sphingobacteriales;Crenotrichaceae; sf_11 Bacteria; Bacteroidetes; Sphingobacteria; 1 1 0 0Sphingobacteriales; Flammeovirgaceae; sf_5 Bacteria; Bacteroidetes;Sphingobacteria; 1 1 0 0 Sphingobacteriales; Flexibacteraceae; sf_19Bacteria; Bacteroidetes; Sphingobacteria; 1 1 0 0 Sphingobacteriales;Sphingobacteriaceae; sf_1 Bacteria; Bacteroidetes; Sphingobacteria; 1 10 0 Sphingobacteriales; Unclassified; sf_3 Bacteria; Bacteroidetes;Sphingobacteria; 1 1 0 0 Sphingobacteriales; Unclassified; sf_6Bacteria; Bacteroidetes; Unclassified; 1 1 0 0 Unclassified;Unclassified; sf_4 Bacteria; Caldithrix; Unclassified; Caldithrales; 1 10 0 Caldithraceae; sf_1 Bacteria; Caldithrix; Unclassified;Caldithrales; 1 1 0 0 Caldithraceae; sf_2 Bacteria; Chlamydiae;Chlamydiae; 1 1 0 0 Chlamydiales; Chlamydiaceae; sf_1 Bacteria;Chlorobi; Chlorobia; Chlorobiales; 1 1 0 0 Chlorobiaceae; sf_1 Bacteria;Chlorobi; Unclassified; Unclassified; 1 1 0 0 Unclassified; sf_1Bacteria; Chlorobi; Unclassified; Unclassified; 1 1 0 0 Unclassified;sf_6 Bacteria; Chlorobi; Unclassified; Unclassified; 1 1 0 0Unclassified; sf_9 Bacteria; Chloroflexi; Anaerolineae; 1 1 0.992 0.000.00 0 1 0 Chloroflexi-1a; Unclassified; sf_1 Bacteria; Chloroflexi;Anaerolineae; 1 1 0 0 Chloroflexi-1b; Unclassified; sf_2 Bacteria;Chloroflexi; Anaerolineae; 1 1 0 0 Unclassified; Unclassified; sf_9Bacteria; Chloroflexi; Chloroflexi-3; 1 1 0 0 Roseiflexales;Unclassified; sf_5 Bacteria; Chloroflexi; Dehalococcoidetes; 1 1 0 0Unclassified; Unclassified; sf_1 Bacteria; Chloroflexi; Unclassified; 11 0 0 Unclassified; Unclassified; sf_12 Bacteria; Coprothermobacteria;Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_1 Bacteria;Cyanobacteria; Cyanobacteria; 1 1 0 0 Chloroplasts; Chloroplasts; sf_11Bacteria; Cyanobacteria; Cyanobacteria; 1 3 0.995 0.00 0.00 0 1 0Chloroplasts; Chloroplasts; sf_5 Bacteria; Cyanobacteria; Cyanobacteria;1 1 0 0 Chroococcales; Unclassified; sf_1 Bacteria; Cyanobacteria;Cyanobacteria; 0 1 0.954 1.09 1.12 0 0 1 Chroococcidiopsis;Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria; 1 1 0 0Leptolyngbya; Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria;1 1 0 0 Nostocales; Unclassified; sf_1 Bacteria; Cyanobacteria;Cyanobacteria; 1 1 0 0 Oscillatoriales; Unclassified; sf_1 Bacteria;Cyanobacteria; Cyanobacteria; 1 1 0 0 Phormidium; Unclassified; sf_1Bacteria; Cyanobacteria; Cyanobacteria; 1 1 0 0 Plectonema;Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria; 1 1 0 0Prochlorales; Unclassified; sf_1 Bacteria; Cyanobacteria; Cyanobacteria;1 1 0 0 Pseudanabaena; Unclassified; sf_1 Bacteria; Cyanobacteria;Cyanobacteria; 1 1 0 0 Spirulina; Unclassified; sf_1 Bacteria;Cyanobacteria; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_5Bacteria; Cyanobacteria; Unclassified; 1 1 0 0 Unclassified;Unclassified; sf_8 Bacteria; Cyanobacteria; Unclassified; 1 1 0 0Unclassified; Unclassified; sf_9 Bacteria; DSS1; Unclassified;Unclassified; 1 1 0 0 Unclassified; sf_2 Bacteria; Deinococcus-Thermus;Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_1 Bacteria;Deinococcus-Thermus; Unclassified; 0 1 0.993 1.19 1.05 0 0 1Unclassified; Unclassified; sf_3 Bacteria; Firmicutes; Bacilli;Bacillales; 1 2 0.963 1.14 1.15 0 1 0 Alicyclobacillaceae; sf_1Bacteria; Firmicutes; Bacilli; Bacillales; 1 151 1.000 1.37 1.23 0 1 0Bacillaceae; sf_1 Bacteria; Firmicutes; Bacilli; Bacillales; 1 6 0.9971.15 1.07 0 1 0 Halobacillaceae; sf_1 Bacteria; Firmicutes; Bacilli;Bacillales; 1 14 0.999 1.19 1.07 0 1 0 Paenibacillaceae; sf_1 Bacteria;Firmicutes; Bacilli; Bacillales; 1 2 0.999 1.12 1.04 0 1 0Sporolactobacillaceae; sf_1 Bacteria; Firmicutes; Bacilli; Bacillales; 16 0.999 1.30 1.06 0 1 0 Staphylococcaceae; sf_1 Bacteria; Firmicutes;Bacilli; Bacillales; 1 6 0.999 1.15 1.09 0 1 0 Thermoactinomycetaceae;sf_1 Bacteria; Firmicutes; Bacilli; Exiguobacterium; 0 1 0.998 0.00 0.000 0 1 Unclassified; sf_1 Bacteria; Firmicutes; Bacilli; Lactobacillales;1 6 0.998 1.23 1.26 0 1 0 Aerococcaceae; sf_1 Bacteria; Firmicutes;Bacilli; Lactobacillales; 1 1 0 0 Carnobacteriaceae; sf_1 Bacteria;Firmicutes; Bacilli; Lactobacillales; 1 3 0.999 1.32 1.08 0 1 0Enterococcaceae; sf_1 Bacteria; Firmicutes; Bacilli; Lactobacillales; 11 0 0 Lactobacillaceae; sf_1 Bacteria; Firmicutes; Bacilli;Lactobacillales; 1 1 0 0 Leuconostocaceae; sf_1 Bacteria; Firmicutes;Bacilli; Lactobacillales; 1 1 0 0 Streptococcaceae; sf_1 Bacteria;Firmicutes; Bacilli; Lactobacillales; 1 1 0 0 Unclassified; sf_1Bacteria; Firmicutes; Catabacter; Unclassified; 1 1 0 0 Unclassified;sf_1 Bacteria; Firmicutes; Catabacter; Unclassified; 1 1 0.954 0.00 0.000 1 0 Unclassified; sf_4 Bacteria; Firmicutes; Clostridia;Clostridiales; 1 14 0.998 1.45 1.15 0 1 0 Clostridiaceae; sf_12Bacteria; Firmicutes; Clostridia; Clostridiales; 1 1 0 0 Eubacteriaceae;sf_1 Bacteria; Firmicutes; Clostridia; Clostridiales;. 1 2 0.990 1.121.12 0 1 0 Lachnospiraceae; sf_5 Bacteria; Firmicutes; Clostridia;Clostridiales; 1 4 0.980 1.12 1.16 0 1 0 Peptococc/Acidaminococc; sf_11Bacteria; Firmicutes; Clostridia; Clostridiales; 1 1 0.976 1.21 1.04 0 10 Peptostreptococcaceae; sf_5 Bacteria; Firmicutes; Clostridia;Clostridiales; 1 1 0 0 Syntrophomonadaceae; sf_5 Bacteria; Firmicutes;Clostridia; Clostridiales; 1 1 0 0 Unclassified; sf_17 Bacteria;Firmicutes; Clostridia; Unclassified; 1 1 0 0 Unclassified; sf_3Bacteria; Firmicutes; Desulfotomaculum; 1 3 0.984 1.14 1.04 0 1 0Unclassified; Unclassified; sf_1 Bacteria; Firmicutes; Mollicutes; 1 1 00 Acholeplasmatales; Acholeplasmataceae; sf_1 Bacteria; Firmicutes;Symbiobacteria; 1 1 0 0 Symbiobacterales; Unclassified; sf_1 Bacteria;Firmicutes; Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_8Bacteria; Firmicutes; gut clone group; 1 1 0 0 Unclassified;Unclassified; sf_1 Bacteria; Gemmatimonadetes; Unclassified; 1 1 0 0Unclassified; Unclassified; sf_5 Bacteria; Natronoanaerobium;Unclassified; 1 1 0 0 Unclassified; Unclassified; sf_1 Bacteria;Nitrospira; Nitrospira; Nitrospirales; 1 1 0 0 Nitrospiraceae; sf_1Bacteria; OD1; OP11-5; Unclassified; 1 1 0 0 Unclassified; sf_1Bacteria; OP8; Unclassified; Unclassified; 1 1 0 0 Unclassified; sf_3Bacteria; Planctomycetes; Planctomycetacia; 1 1 0 0 Planctomycetales;Anammoxales; sf_2 Bacteria; Planctomycetes; Planctomycetacia; 1 1 0 0Planctomycetales; Anammoxales; sf_4 Bacteria; Planctomycetes;Planctomycetacia; 1 1 0 0 Planctomycetales; Pirellulae; sf_3 Bacteria;Planctomycetes; Planctomycetacia; 1 1 0 0 Planctomycetales;Planctomycetaceae; sf_3 Bacteria; Proteobacteria; Alphaproteobacteria; 11 0.943 0.00 0.00 0 1 0 Acetobacterales; Acetobacteraceae; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 6 0.980 1.24 1.17 0 1 0Acetobacterales; Roseococcaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0.947 1.12 1.10 0 1 0 Azospirillales;Azospirillaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 10 0 Azospirillales; Magnetospirillaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Azospirillales; Unclassified; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 2 0.951 1.13 1.08 0 1 0Bradyrhizobiales; Beijerinck/Rhodoplan/Methylocyst; sf_3 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0 0 Bradyrhizobiales;Bradyrhizobiaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 11 0 0 Bradyrhizobiales; Hyphomicrobiaceae; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 1 2 0.999 0.00 0.00 0 1 0Bradyrhizobiales; Methylobacteriaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 4 0.982 1.15 1.11 0 1 0 Bradyrhizobiales;Unclassified; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 00 Bradyrhizobiales; Xanthobacteraceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0.968 0.00 0.00 0 1 0 Caulobacterales;Caulobacteraceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 11 0.951 0.00 0.00 0 1 0 Consistiales; Caedibacteraceae; sf_3 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0 0 Consistiales;Caedibacteraceae; sf_4 Bacteria; Proteobacteria; Alphaproteobacteria; 11 0 0 Consistiales; Caedibacteraceae; sf_5 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Consistiales; Unclassified; sf_4 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0.976 1.18 1.05 0 1 0 Devosia;Unclassified; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 00 Ellin314/wr0007; Unclassified; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Ellin329/Riz1046; Unclassified; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 0 0 Fulvimarina;Unclassified; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 00 Rhizobiales; Bartonellaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rhizobiales;Beijerinck/Rhodoplan/Methylocyst; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rhizobiales; Bradyrhizobiaceae; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 0 0 Rhizobiales;Brucellaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 1 00 Rhizobiales; Hyphomicrobiaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rhizobiales; Phyllobacteriaceae; sf_1Bacteria; Proteobacteria; Alphaproteobacteria; 1 2 0.981 1.27 1.26 0 1 0Rhizobiales; Rhizobiaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rhizobiales; Unclassified; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0 0 Rhodobacterales;Hyphomonadaceae; sf_1 Bacteria; Proteobacteria; Alphaproteobacteria; 1 60.985 1.13 1.11 0 1 0 Rhodobacterales; Rhodobacteraceae; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0 0 Rickettsiales;Anaplasmataceae; sf_3 Bacteria; Proteobacteria; Alphaproteobacteria; 1 10 0 Rickettsiales; Rickettsiaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 1 0 0 Rickettsiales; Unclassified; sf_2 Bacteria;Proteobacteria; Alphaproteobacteria; 1 9 0.994 1.23 1.10 0 1 0Sphingomonadales; Sphingomonadaceae; sf_1 Bacteria; Proteobacteria;Alphaproteobacteria; 1 6 0.990 1.13 1.06 0 1 0 Sphingomonadales;Sphingomonadaceae; sf_15 Bacteria; Proteobacteria; Alphaproteobacteria;0 3 0.997 1.20 1.08 0 0 1 Sphingomonadales; Unclassified; sf_1 Bacteria;Proteobacteria; Alphaproteobacteria; 1 1 0.954 0.00 0.00 0 1 0Unclassified; Unclassified; sf_6 Bacteria; Proteobacteria;Betaproteobacteria; 1 3 1.000 1.35 1.07 0 1 0 Burkholderiales;Alcaligenaceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria; 1 121.000 0.00 0.00 0 1 0 Burkholderiales; Burkholderiaceae; sf_1 Bacteria;Proteobacteria; Betaproteobacteria; 1 1 0 0 Burkholderiales;Comamonadaceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria; 1 20.996 0.00 0.00 0 1 0 Burkholderiales; Oxalobacteraceae; sf_1 Bacteria;Proteobacteria; Betaproteobacteria; 1 1 0 0 Burkholderiales;Ralstoniaceae; sf_1 Bacteria; Proteobacteria; Betaproteobacteria; 1 1 00 MND1 clone group; Unclassified; sf_1 Bacteria; Proteobacteria;Betaproteobacteria; 1 1 0 0 Methylophilales; Methylophilaceae; sf_1Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0 Neisseriales;Unclassified; sf_1 Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0Nitrosomonadales; Nitrosomonadaceae; sf_1 Bacteria; Proteobacteria;Betaproteobacteria; 1 1 0 0 Rhodocyclales; Rhodocyclaceae; sf_1Bacteria; Proteobacteria; Betaproteobacteria; 1 1 0 0 Unclassified;Unclassified; sf_3 Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 00 AMD clone group; Unclassified; sf_1 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Bdellovibrionales; Unclassified; sf_1Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0Desulfobacterales; Desulfobulbaceae; sf_1 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Desulfobacterales; Nitrospinaceae; sf_2Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0Desulfobacterales; Unclassified; sf_4 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Desulfovibrionales; Desulfohalobiaceae;sf_1 Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0Desulfovibrionales; Desulfovibrionaceae; sf_1 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Desulfovibrionales; Unclassified; sf_1Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0 EB1021 group;Unclassified; sf_4 Bacteria; Proteobacteria; Deltaproteobacteria; 0 10.974 0.00 0.00 0 0 1 Myxococcales; Myxococcaceae; sf_1 Bacteria;Proteobacteria; Deltaproteobacteria; 1 1 0 0 Myxococcales;Polyangiaceae; sf_3 Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 00 Myxococcales; Unclassified; sf_1 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 Syntrophobacterales; Syntrophobacteraceae;sf_1 Bacteria; Proteobacteria; Deltaproteobacteria; 1 1 0 0Unclassified; Unclassified; sf_9 Bacteria; Proteobacteria;Deltaproteobacteria; 1 1 0 0 dechlorinating clone group; Unclassified;sf_1 Bacteria; Proteobacteria; Epsilonproteobacteria; 1 1 0 0Campylobacterales; Campylobacteraceae; sf_3 Bacteria; Proteobacteria;Epsilonproteobacteria; 1 1 0 0 Campylobacterales; Helicobacteraceae;sf_3 Bacteria; Proteobacteria; Epsilonproteobacteria; 1 1 0 0Campylobacterales; Unclassified; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Aeromonadales; Aeromonadaceae; sf_1Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Alteromonadales;Alteromonadaceae; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 11 0 0 Alteromonadales; Pseudoalteromonadaceae; sf_1 Bacteria;Proteobacteria; Gammaproteobacteria; 1 1 0 0 Alteromonadales;Unclassified; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 00 Chromatiales; Chromatiaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Chromatiales; Ectothiorhodospiraceae; sf_1Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Chromatiales;Unclassified; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 00 Ellin307/WD2124; Unclassified; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 3 0.995 1.12 1.04 0 1 0 Enterobacteriales;Enterobacteriaceae; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria;1 1 0 0 Enterobacteriales; Enterobacteriaceae; sf_6 Bacteria;Proteobacteria; Gammaproteobacteria; 1 1 0 0 GAO cluster; Unclassified;sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0Legionellales; Coxiellaceae; sf_3 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Legionellales; Unclassified; sf_1 Bacteria;Proteobacteria; Gammaproteobacteria; 1 1 0 0 Legionellales;Unclassified; sf_3 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 00 Methylococcales; Methylococcaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Oceanospirillales; Alcanivoraceae; sf_1Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0Oceanospirillales; Halomonadaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Oceanospirillales; Unclassified; sf_3Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Pasteurellales;Pasteurellaceae; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 20.996 1.16 1.10 0 1 0 Pseudomonadales; Moraxellaceae; sf_3 Bacteria;Proteobacteria; Gammaproteobacteria; 1 2 0.998 1.18 1.03 0 1 0Pseudomonadales; Pseudomonadaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 SUP05; Unclassified; sf_1 Bacteria;Proteobacteria; Gammaproteobacteria; 1 1 0 0 Shewanella; Unclassified;sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Symbionts;Unclassified; sf_1 Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 00 Thiotrichales; Francisellaceae; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 Thiotrichales; Piscirickettsiaceae; sf_3Bacteria; Proteobacteria; Gammaproteobacteria; 1 1 0 0 Thiotrichales;Thiotrichaceae; sf_3 Bacteria; Proteobacteria; Gammaproteobacteria; 1 10 0 Unclassified; Unclassified; sf_3 Bacteria; Proteobacteria;Gammaproteobacteria; 1 2 0.997 0.00 0.00 0 1 0 Xanthomonadales;Xanthomonadaceae; sf_3 Bacteria; Proteobacteria; Gammaproteobacteria; 11 0 0 aquatic clone group; Unclassified; sf_1 Bacteria; Proteobacteria;Gammaproteobacteria; 1 1 0 0 uranium waste clones; Unclassified; sf_1Bacteria; Proteobacteria; Unclassified; 1 1 0 0 Unclassified;Unclassified; sf_20 Bacteria; Spirochaetes; Spirochaetes; 1 1 0 0Spirochaetales; Leptospiraceae; sf_3 Bacteria; Spirochaetes;Spirochaetes; 1 1 0 0 Spirochaetales; Spirochaetaceae; sf_1 Bacteria;Spirochaetes; Spirochaetes; 1 1 0 0 Spirochaetales; Spirochaetaceae;sf_3 Bacteria; TM7; TM7-3; Unclassified; 1 1 0 0 Unclassified; sf_1Bacteria; TM7; Unclassified; Unclassified; 1 1 0 0 Unclassified; sf_1Bacteria; Verrucomicrobia; Unclassified; 1 1 0 0 Unclassified;Unclassified; sf_4 Bacteria; Verrucomicrobia; Unclassified; 1 1 0 0Unclassified; Unclassified; sf_5 Bacteria; Verrucomicrobia;Verrucomicrobiae; 1 1 0 0 Verrucomicrobiales; Unclassified; sf_3Bacteria; Verrucomicrobia; Verrucomicrobiae; 1 1 0 0 Verrucomicrobiales;Verrucomicrobia subdivision 5; sf_1 Bacteria; Verrucomicrobia;Verrucomicrobiae; 1 1 0 0 Verrucomicrobiales; Verrucomicrobiaceae; sf_6Bacteria; Verrucomicrobia; Verrucomicrobiae; 1 1 0 0 Verrucomicrobiales;Verrucomicrobiaceae; sf_7 Bacteria; WS3; Unclassified; Unclassified; 1 10 0 Unclassified; sf_1 Bacteria; marine group A; mgA-1; Unclassified; 11 0 0 Unclassified; sf_1 Bacteria; marine group A; mgA-2; Unclassified;1 1 0 0 Unclassified; sf_1 Totals 238 67 178 60 7 Array Clone ArrayArray Clone sub- sub- only and only families families sub- clone sub-families sub- families families ¹A sub-family must have at least onetaxon present above the positive probe threshold of 0.92 (92%) in allthree replicates to be considered present. ²For a clone to be assignedto a sub-family its DNAML similarity must be above the 0.94 (94%)threshold defined for sub-families. ³This is the maximum DNAMLsimilarity measured. ⁴Both maximum preference score and maximumdivergence ratio must pass the criteria below for a clone to beconsidered non-chimeric. ⁵Bellerophon preference score, a ratio of 1.3or greater has been empirically shown to demonstrate a chimericmolecule. ⁶Bellerophon divergence ratio. This is a new metric devised toaid chimera detection, a score greater than 1.1 indicates a potentialchimera.

TABLE S3 Confirmation of array sub-family detections by taxon-specificPCR and sequencing. Tm = Melting temperature; Ta = Optimal annealingtemperature used in PCR reaction. Genbank accession number of ClosestBLAST homolog SEQ retrieved Sub-family GenBank accession number IDPrimer Sequences Tm Ta sequence (sf) verified (% identity) NO. (5′ to3′) ° C. ° C. DQ236248 Actinobacteria, Actinokineospora diospyrosa, 5For-ACCAAGGCTACGACGGGTA 60.5 67.0 Actinosynnemataceae, AF114797 (94.3%)6 Rev-ACACACCGCATGTCAAACC 60.4 sf_1 DQ515230 Actinobacteria,Bifidobacterium adolescentis, 7 For-GGGTGGTAATGCCSGATG 60.0 62.0Bifidobacteriaceae, AF275881 (99.6 %) 8 Rev-CCRCCGTTACACCGGGAA 64.0 sf_1DQ236245 Actinobacteria, Actinomycetaceae SR 11, 9For-CAATGGACTCAAGCCTGATG 53.5 53.0 Kineosporiaceae, sf_1 X87617 (97.7%)10 Rev-CTCTAGCCTGCCCGTTTC 53.9 DQ236250 Chloroflexi, penguin droppingsclone KD4-96, 11 For-GAGAGGATGATCAGCCAG 54.0 61.7 Anaerolineae, sf_9AY218649 (90%) 12 Rev- 57.0 TACGGYTACCTTGTTACGACTT DQ236247Cyanobacteria, Geitlerinema sp. PCC 7105, 13 For- 62.2 55.0Geitlerinema, sf_1 AB039010 (89.3%) TCCGTAGGTGGCTGTTCAAGTCTG 14 Rev-61.7 GCTTTCGTCCCTCAGTGTCAGTTG DQ236246 Cyanobacteria,Thermosynechococcus elongatus 15 For- 58.7 55.0 Thermosynechococcus,BP-1, TGTCGTGAGATGTTGGGTTAAGTC sf_1 BA000039 (96.0%) 16 Rev- 58.8TGAGCCGTGGTTTAAGAGATTAGC DQ129654 Gammaproteobacteria, Pseudoalteromonassp. S511-1, 17 For-GCCTCACGCCATAAGATTAG 53.1 50.0 PseudoaltermonadaceaeAB029824 (99.1%) 18 Rev- 53.0 sf_1 GTGCTTTCTTCTGTAAGTAACCG DQ129656Nitrospira, Nitrospira moscoviensis, 19 For-TCGAAAAGCGTGGGG 57.6 47.0Nitrospiraceae, sf_1 X82558 (98.5%) 20 Rev-CTTCCTCCCCCGTTC 54.4 DQ129666Planctomycetes, Planctomyces brasiltensis, 21 For-GAAACTGCCCAGACAC 50.060.0 Plantomycetaceae, AJ231190 (94%) 22 Rev-AGTAACGTTCGCACAG 48.0 sf_3DQ515231 Proteobacteria, Uncultured Arcobacter sp. clone 23For-GGATGACACTTTTCGGAG 54.0 48.0 Campylobacteraceae DS017, 24Rev-AATTCCATCTGCCTCTCC 55.0 sf_3 DQ234101 (98%) DQ129662 Spirochaetes,Leptospira borgpetersenii 25 For-GGCGGCGCGTTTTAAGC 57.0 58.7Leptospiracea, sf_3 X17547 (90.9%) 26 Rev-ACTCGGGTGGTGTGACG 57.0DQ129661 Spirochaetes, Spirochaeta asiatica, Spirochaetaceae, sf_1X93926 (90.0%) DQ129660 Spirochaetes, Borrelia hermsii Spirochaetaceae,M72398 (91.0%) sf_3 DQ236249 TM7, TM7-3 sf_1 oral clone EW096, 27For-AYTGGGCGTAAAGAGTTGC 58.0 66.3 AY349415 (88.8%) 28 Rev- 57.0TACGGYTACCTTGTTACGACTT

TABLE S4 Bacteria and Archaea used for Latin square hybridizationassays. Organism Phylum/Sub-phylum ATCC Arthrobacter oxydansActinobacteria 14359^(a) Bacillus anthracis AMES Firmicutes —^(b) pX01-pX02- Caulobacter crescentus CB15 Alpha-proteobacteria 19089   Dechloromonas agitata CKB Beta-proteobacteria 700666^(c) Dehalococcoides ethenogenes 195 Chloroflexi —^(d) Desulfovibrio vulgarisDelta-proteobacteria 29579^(e) Hildenborough Francisella tularensisGamma-proteobacteria  6223 Geobacter metallireducens GS-15Delta-proteobacteria 53774^(c) Geothrix fermentans H-5 Acidobacteria700665^(c)  Sulfolobus solfataricus Crenarchaeota 35092    ^(a)Stainobtained from Hoi-Ying Holman, LBNL. ^(b)Strain obtained from ArthurFriedlander USAMRID. ^(c)Strain obtained from John Coates, UC Berkeley.^(d)Strain obtained from Lisa Alvarez-Cohen, UC Berkeley. ^(e)Strainobtained from Terry Hazen, LBNL.

TABLE S5 Correlations between environmental/temporal parameters. meanmax min range mean mean max max range Sub-family- week TEMP MAXTEMPMINTEMP MINTEMP WDSP SLP VISIB PM2.5 PM2.5 level richness Austin week1.000 mean TEMP 0.703 1.000 max MAXTEMP 0.471 0.665 1.000 min MINTEMP0.685 0.691 0.073 1.000 range MINTEMP −0.267   −0.149   0.571 −0.7771.000 mean WDSP −0.540   −0.053   −0.038   −0.195 0.136 1.000 mean SLP0.607 0.145 0.162 0.352 −0.188 −0.380   1.000 max VISIB 0.486 0.3110.400 0.230 0.063 −0.498   0.318 1.000 max PM2.5 −0.529   −0.219  −0.331   −0.162 −0.075 0.617 −0.409   −0.817 1.000 range PM2.5 −0.507  −0.219   −0.366   −0.117 −0.134 0.613 −0.407   −0.829 0.989 1.000Sub-family-level −0.074   −0.104   0.098 −0.460 0.440 0.251 −0.182  −0.066 −0.058   −0.063 1.000 richness San Antonio week 1.000 mean TEMP0.452 1.000 max MAXTEMP 0.189 0.553 1.000 min MINTEMP 0.570 0.622 0.0441.000 range MINTEMP −0.318   −0.116   0.630 −0.749 1.000 mean WDSP−0.523   −0.014   −0.015   −0.014 0.001 1.000 mean SLP 0.722 0.029−0.088   0.300 −0.291 −0.495   1.000 max VISIB 0.420 0.169 0.298 −0.0540.240 −0.234   0.501 1.000 max PM2.5 −0.508   −0.157   −0.197   −0.022−0.114 0.189 −0.420   −0.830 1.000 range PM2.5 −0.515   −0.164  −0.201   0.000 −0.134 0.255 −0.455   −0.843 0.991 1.000 Sub-family-level0.125 −0.016   −0.050   0.024 −0.051 −0.419   0.175 −0.054 −0.064  −0.102 1.000 richness Underlined font indicates a significant positivecorrelation, while italic font indicates a significant negativecorrelation at a 95% confidence interval.

TABLE S6 Sub-families detected in Austin or San Antonio correlatingsignificantly with environmental parameters. taxon and BH Sub-representative Environ. Correl. p adjusted Phylum Class Order Familyfamily organism name factor Coeff. value p. value^(a) ActinobacteriaActinobacteria Actinomycetales Unclassified sf_3 1114 clone PENDANT-38max TEMP 0.64 4.05E−05 2.49E−02 Actinobacteria ActinobacteriaActinomycetales Unclassified sf_3 1114 clone PENDANT-38 mean TEMP 0.662.16E−05 2.01E−02 Actinobacteria Actinobacteria ActinomycetalesUnclassified sf_3 1114 clone PENDANT-38 week 0.63 6.73E−05 3.18E−02Actinobacteria Actinobacteria Actinomycetales Gordoniaceae sf_1 1116Gordona terrae week 0.61 1.18E−04 3.68E−02 Actinobacteria ActinobacteriaActinomycetales Actinosynnemataceae sf_1 1125 Actinokineospora max TEMP0.6 1.53E−04 4.30E−02 diospyrosa str. NRRL B-24047T ActinobacteriaActinobacteria Actinomycetales Actinosynnemataceae sf_1 1125Actinokineospora week 0.63 7.42E−05 3.38E−02 diospyrosa str. NRRLB-24047T Actinobacteria Actinobacteria Actinomycetales Streptomycetaceaesf_1 1128 Streptomyces sp. week 0.7 3.75E−06 1.18E−02 str. YIM 80305Actinobacteria Actinobacteria Actinomycetales Sporichthyaceae sf_1 1223Sporichthya mean TEMP 0.61 1.42E−04 4.21E−02 polymorpha ActinobacteriaActinobacteria Actinomycetales Sporichthyaceae sf_1 1223 Sporichthya minMINTEMP 0.61 1.50E−04 4.27E−02 polymorpha Actinobacteria ActinobacteriaActinomycetales Sporichthyaceae sf_1 1223 Sporichthya week 0.7 4.39E−061.18E−02 polymorpha Actinobacteria Actinobacteria ActinomycetalesMicrobacteriaceae sf_1 1264 Waste-gas biofilter mean TEMP 0.61 1.47E−044.25E−02 clone BIhi33 Actinobacteria Actinobacteria ActinomycetalesMicrobacteriaceae sf_1 1264 Waste-gas biofilter week 0.69 7.62E−061.18E−02 clone BIhi33 Actinobacteria Actinobacteria ActinomycetalesStreptomycetaceae sf_1 1344 Streptomyces max TEMP 0.64 5.42E−05 2.84E−02species Actinobacteria Actinobacteria Actinomycetales Streptomycetaceaesf_1 1344 Streptomyces mean TEMP 0.62 9.56E−05 3.63E−02 speciesActinobacteria Actinobacteria Actinomycetales Thermomonosporaceae sf_11406 Actinomadura week 0.65 2.91E−05 2.29E−02 kijaniata ActinobacteriaActinobacteria Actinomycetales Kineosporiaceae sf_1 1424Actinomycetaceae max VISIB 0.6 1.70E−04 4.59E−02 SR 139 ActinobacteriaActinobacteria Actinomycetales Kineosporiaceae sf_1 1424Actinomycetaceae week 0.62 8.03E−05 3.50E−02 SR 139 ActinobacteriaActinobacteria Actinomycetales Intrasporangiaceae sf_1 1445Ornithinimicrobium week 0.62 9.46E−05 3.63E−02 humiphilum str. DSM 12362HKI 124 Actinobacteria Actinobacteria Actinomycetales Unclassified sf_31514 uncultured human oral week 0.69 7.08E−06 1.18E−02 bacterium A11Actinobacteria Actinobacteria Actinomycetales Pseudonocardiaceae sf_11530 Pseudonocardia max TEMP 0.64 5.10E−05 2.79E−02 thermophila str.IMSNU 20112T Actinobacteria Actinobacteria ActinomycetalesPseudonocardiaceae sf_1 1530 Pseudonocardia mean TEMP 0.66 1.99E−051.97E−02 thermophila str. IMSNU 20112T Actinobacteria ActinobacteriaActinomycetales Pseudonocardiaceae sf_1 1530 Pseudonocardia min MINTEMP0.61 1.10E−04 3.63E−02 thermophila str. IMSNU 20112T ActinobacteriaActinobacteria Actinomycetales Pseudonocardiaceae sf_1 1530Pseudonocardia min TEMP 0.6 1.82E−04 4.73E−02 thermophila str. IMSNU20112T Actinobacteria Actinobacteria Actinomycetales Pseudonocardiaceaesf_1 1530 Pseudonocardia week 0.73 1.15E−06 5.92E−03 thermophila str.IMSNU 20112T Actinobacteria Actinobacteria ActinomycetalesCellulomonaaaceae sf_1 1592 Lake Bogoria isolate week 0.61 1.15E−043.63E−02 69B4 Actinobacteria Actinobacteria ActinomycetalesCorynebacteriaceae sf_1 1642 Corynebacterium max TEMP 0.62 8.87E−053.63E−02 otitidis Actinobacteria Actinobacteria ActinomycetalesCorynebacteriaceae sf_1 1642 Corynebacterium mean TEMP 0.64 4.12E−052.49E−02 otitidis Actinobacteria Actinobacteria ActinomycetalesCorynebacteriaceae sf_1 1642 Corynebacterium min MINTETEMP 0.62 1.07E−043.63E−02 otitidis Actinobacteria Actinobacteria ActinomycetalesCorynebacteriaceae sf_1 1642 Corynebacterium week 0.63 5.53E−05 2.84E−02otitidis Actinobacteria Actinobacteria Actinomycetales Dermabacteraceaesf_1 1736 Brachybacterium max TEMP 0.63 6.17E−05 3.09E−02 rhamnosum LMG19848T Actinobacteria Actinobacteria Actinomycetales Dermabacteraceaesf_1 1736 Brachybacterium mean TEMP 0.6 1.91E−04 4.90E−02 rhamnosum LMG19848T Actinobacteria Actinobacteria Actinomycetales Dermabacteraceaesf_1 1736 Brachybacterium week 0.64 4.47E−05 2.62E−02 rhamnosum LMG19848 T Actinobacteria Actinobacteria Actinomycetales Streptomycetaceaesf_3 1743 Streptomyces scabiei str. week 0.6 1.60E−04 4.38E−02 DNK-G01Actinobacteria Actinobacteria Actinomycetales Nocardiaceae sf_1 1746Nocardia week 0.66 2.48E−05 2.21E−02 corynebacteroides ActinobacteriaActinobacteria Actinomycetales Unclassified sf_3 1806 French Polynesia:Tahiti max TEMP 0.65 3.37E−05 2.29E−02 clone 23 ActinobacteriaActinobacteria Actinomycetales Unclassified sf_3 1806 French Polynesia:Tahiti mean TEMP 0.66 1.97E−05 1.97E−02 clone 23 ActinobacteriaActinobacteria Actinomycetales Micromonosporaceae sf_1 1821Catellatospora max TEMP 0.61 1.10E−04 3.63E−02 subsp. citrea str. IMSNU22008T Actinobacteria Actinobacteria Actinomycetales Micromonosporaceaesf_1 1821 Catellatospora mean MINTEMP 0.61 1.22E−04 3.72E−02 subsp.citrea str. IMSNU 22008T Actinobacteria Actinobacteria ActinomycetalesMicromonosporaceae sf_1 1821 Catellatospora subsp. mean TEMP 0.671.76E−05 1.97E−02 citrea str. IMSNU 22008T Actinobacteria ActinobacteriaActinomycetales Micromonosporaceae sf_1 1821 Catellatospora min MINTEMP0.7 4.92E−06 1.18E−02 subsp. citrea str. IMSNU 22008T ActinobacteriaActinobacteria Actinomycetales Micromonosporaceae sf_1 1821Catellatospora min TEMP 0.65 2.68E−05 2.29E−02 subsp. citrea str. IMSNU22008T Actinobacteria Actinobacteria Actinomycetales Micromonosporaceaesf_1 1821 Catellatospora week 0.62 8.24E−05 3.52E−02 subsp. citrea str.IMSNU 22008T Actinobacteria Actinobacteria RubrobacteralesRubrobacteraceae sf_1 1892 Sturt arid-zone soil min MINTEMP 0.628.51E−05 3.56E−02 clone 0319-7H2 Actinobacteria ActinobacteriaRubrobacterales Rubrobacteraceae sf_1 1892 Sturt arid-zone soil week0.68 9.19E−06 1.18E−02 clone 0319-7H2 Actinobacteria ActinobacteriaActinomycetales Actinosynnemataceae sf_1 1984 Saccharothrix max TEMP0.68 8.01E−06 1.18E−02 tangerinus str. MK27-91F2 ActinobacteriaActinobacteria Actinomycetales Actinosynnemataceae sf_1 1984Saccharothrix mean TEMP 0.67 1.64E−05 1.97E−02 tangerinus str. MK27-91F2Actinobacteria Actinobacteria Actinomycetales Actinosynnemataceae sf_11984 Saccharothrix week 0.7 3.54E−06 1.18E−02 tangerinus str. MK27-91F2Actinobacteria Actinobacteria Actinomycetales Nocardiaceae sf_1 1999Rhodococcus max TEMP 0.61 1.14E−04 3.63E−02 fascians str. DFA7Actinobacteria Actinobacteria Actinomycetales Propionibacteriaceae sf_12023 Propionibacterium week 0.62 9.19E−05 3.63E−02 propionicum str. DSM43307T Actinobacteria Actinobacteria ActinomycetalesStreptosporangiaceae sf_1 2037 Nonomuraea week 0.61 1.13E−04 3.63E−02terrinata str. DSM 44505 Firmicutes Bacilli Bacillales Thermoactin sf_13619 Thermoactinomyces range MINTEMP 0.65 3.41E−05 2.29E−02 omycetaceaeintermedius str. ATCC 33205T Cyanobacteria Cyanobacteria SymplocaUnclassified sf_1 5165 Symploca atlantica str. week 0.63 6.84E−053.18E−02 PCC 8002 Bacteroidetes Sphingobacteria SphingobacterialesCrenotrichaceae sf_11 5491 Austria: Lake mean TEMP 0.61 1.15E−043.63E−02 Gossenkoellesee clone GKS2-106 Bacteroidetes SphingobacteriaSphingobacteriales Crenotrichaceae sf_11 5491 Austria: Lake week 0.636.62E−05 3.18E−02 Gossenkoellesee clone GKS2-106 BacteroidetesSphingobacteria Sphingobacteriales Flexibacteraceae sf_19 5866Taxeobacter week 0.62 1.08E−04 3.63E−02 ocellatus str. Myx2105Bacteroidetes Bacteroidetes Bacteroidales Prevotellaceae sf_1 6047 deepmarine week 0.62 9.52E−05 3.63E−02 sediment clone MB-A2-107Bacteroidetes Sphingobacteria Sphingobacteriales Crenotrichaceae sf_116171 Bifissio spartinae max PM2.5 −0.62 9.95E−05 3.63E−02 str. AS1.1762Bacteroidetes Sphingobacteria Sphingobacteriales Crenotrichaceae sf_116171 Bifissio spartinae max VISIB 0.62 1.09E−04 3.63E−02 str. AS1.1762Bacteroidetes Sphingobacteria Sphingobacteriales Crenotrichaceae sf_116171 Bifissio spartinae range PM2.5 −0.65 2.86E−05 2.29E−02 str.AS1.1762 Bacteroidetes Sphingobacteria SphingobacterialesCrenotrichaceae sf_11 6171 Bifissio spartinae week 0.61 1.25E−043.76E−02 str. AS1.1762 Proteobacteria AlphaproteobacteriaSphingomonadales Sphingmonadaceae sf_1 6808 PCB-polluted mean TEMP 0.637.73E−05 3.45E−02 soil clone WD267 Proteobacteria AlphaproteobacteriaSphingomonadales Sphingomonadaceae sf_1 6808 PCB-polluted week 0.695.54E−06 1.18E−02 soil clone WD267 Proteobacteria AlphaproteobacteriaSphingomonadales Sphingomonadaceae sf_1 7132 Sphingomonas min SLP 0.645.16E−05 2.79E−02 sp. K101 Proteobacteria AlphaproteobacteriaSphingomonadales Sphingomonadaceae sf_1 7132 Sphingomonas week 0.752.74E−07 2.81E−03 sp. K101 Proteobacteria AlphaproteobacteriaBradyrhizobiales Unclassified sf_1 7255 Pleomorphomonas max TEMP 0.653.57E−05 2.29E−02 oryzae str. B-32 Proteobacteria AlphaproteobacteriaBradyrhizobiales Unclassified sf_1 7255 Pleomorphomonas mean TEMP 0.644.62E−05 2.63E−02 oryzae str. B-32 Proteobacteria AlphaproteobacteriaSphingomonadales Sphingomonadaceae sf_1 7344 rhizosphere soil week 0.688.96E−06 1.18E−02 RSI-21 Proteobacteria AlphaproteobacteriaSphingomonadales Sphingomonadaceae sf_1 7411 Sphingomonas adhaesiva minSLP 0.66 2.01E−05 1.97E−02 Proteobacteria AlphaproteobacteriaSphingomonadales Sphingomonadaceae sf_1 7411 Sphingomonas adhaesiva week0.74 6.42E−07 4.39E−03 Proteobacteria AlphaproteobacteriaRhodobacterales Rhodobacteraceae sf_1 7527 clone CTD56B mean TEMP 0.611.44E−04 4.23E−02 Proteobacteria Alphaproteobacteria SphingomonadalesSphingomonadaceae sf_1 7555 derived microbial ‘pearl’- week 0.6 1.60E−044.38E−02 community clone sipK48 Proteobacteria AlphaproteobacteriaBradyrhizobiales Methylobacteriaceae sf_1 7593 Methylobacterium max TEMP0.65 3.53E−05 2.29E−02 organophilum Proteobacteria AlphaproteobacteriaBradyrhizobiales Methylobacteriaceae sf_1 7593 Methylobacterium meanTEMP 0.62 9.87E−05 3.63E−02 organophilum ProteobacteriaAlphaproteobacteria Bradyrhizobiales Methylobacteriaceae sf_1 7593Methylobacterium week 0.68 8.06E−06 1.18E−02 organophilum ProteobacteriaAlphaproteobacteria Devosia Unclassified sf_1 7626 Devosia neptuniaeweek 0.6 1.80E−04 4.73E−02 str. J1 Proteobacteria BetaproteobacteriaBurkholderiales Comamonadaceae sf_1 7786 unidentified alpha mean TEMP0.65 3.45E−05 2.29E−02 proteobacterium Proteobacteriabac BetaproteoteriaBurkholderiales Burkholderiaceae sf_1 7899 Burkholderia week 0.653.43E−05 2.29E−02 andropogonis Proteobacteria GammaproteobacteriaUnclassified Unclassified sf_3 8759 Agricultural soil max TEMP 0.61.74E−04 4.63E−02 SC-I-87 Proteobacteria GammaproteobacteriaPseudomonadales Pseudomonadaceae sf_1 9389 Pseudomonas min SLP 0.688.57E−06 1.18E−02 oleovorans Protebacteria GammaproteobacteriaPseudomonadales Pseudomonadaceae sf_1 9389 Pseudomonas week 0.831.03E−09 2.11E−05 oleovorans ^(a)P-value is adjusted for multiplecomparisons using false discovery rate controlling procedure (S18). Allof the below are in the Domain of Bacteria

TABLE S7 Bacterial sub-families detected (92% or greater of probes inprobe set positive) most frequently over 17 week study. Most frequentlydetected 16S rRNA gene sequences AU SA Bacteria; Acidobacteria;Acidobacteria; Acidobacteriales; Acidobacteriaceae; sf_14 17 17Bacteria; Acidobacteria; Acidobacteria-6; Unclassified; Unclassified;sf_1 16 17 Bacteria; Acidobacteria; Solibacteres; Unclassified;Unclassified; sf_1 17 17 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; Cellulomonadaceae; sf_1 17 17 Bacteria; Actinobacteria;Actinobacteria; Actinomycetales; Corynebacteriaceae; sf_1 16 17Bacteria; Actinobacteria; Actinobacteria; Actinomycetales; Gordoniaceae;sf_1 17 17 Bacteria; Actinobacteria; Actinobacteria; Actinomycetales;Kineosporiaceae; sf_1 17 17 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; Microbacteriaceae; sf_1 16 17 Bacteria; Actinobacteria;Actinobacteria; Actinomycetales; Micrococcaceae; sf_1 17 17 Bacteria;Actinobacteria; Actinobacteria; Actinomycetales; Micromonosporaceae;sf_1 17 17 Bacteria; Actinobacteria; Actinobacteria; Actinomycetales;Mycobacteriaceae; sf_1 17 17 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; Nocardiaceae; sf_1 17 17 Bacteria; Actinobacteria;Actinobacteria; Actinomycetales; Promicromonosporaceae; sf_1 17 17Bacteria; Actinobacteria; Actinobacteria; Actinomycetales;Pseudonocardiaceae; sf_1 16 17 Bacteria; Actinobacteria; Actinobacteria;Actinomycetales; Streptomycetaceae; sf_1 17 17 Bacteria; Actinobacteria;Actinobacteria; Actinomycetales; Thermomonosporaceae; sf_1 16 17Bacteria; Actinobacteria; Actinobacteria; Actinomycetales; Unclassified;sf_3 17 17 Bacteria; Actinobacteria; Actinobacteria; Rubrobacterales;Rubrobacteraceae; sf_1 16 17 Bacteria; Actinobacteria; Actinobacteria;Unclassified; Unclassified; sf_1 16 17 Bacteria; Actinobacteria; BD2-10group; Unclassified; Unclassified; sf_2 17 16 Bacteria; Bacteroidetes;Sphingobacteria; Sphingobacteriales; Unclassified; sf_3 16 17 Bacteria;Chloroflexi; Anaerolineae; Chloroflexi-la; Unclassified; sf_1 16 17Bacteria; Chloroflexi; Anaerolineae; Unclassified; Unclassified; sf_9 1617 Bacteria; Chloroflexi; Dehalococcoidetes; Unclassified; Unclassified;sf_1 16 17 Bacteria; Cyanobacteria; Cyanobacteria; Chloroplasts;Chloroplasts; sf_5 17 17 Bacteria; Cyanobacteria; Cyanobacteria;Plectonema; Unclassified; sf_1 16 17 Bacteria; Cyanobacteria;Unclassified; Unclassified; Unclassified; sf_5 16 17 Bacteria;Firmicutes; Bacilli; Bacillales; Bacillaceae; sf_1 17 17 Bacteria;Firmicutes; Bacilli; Bacillales; Halobacillaceae; sf_1 17 17 Bacteria;Firmicutes; Bacilli; Bacillales; Paenibacillaceae; sf_1 16 17 Bacteria;Firmicutes; Bacilli; Lactobacillales; Enterococcaceae; sf_1 17 17Bacteria; Firmicutes; Bacilli; Lactobacillales; Streptococcaceae; sf_116 17 Bacteria; Firmicutes; Catabacter; Unclassified; Unclassified; sf_116 17 Bacteria; Firmicutes; Clostridia; Clostridiales; Clostridiaceae;sf_12 17 17 Bacteria; Firmicutes; Clostridia; Clostridiales;Lachnospiraceae; sf_5 17 17 Bacteria; Firmicutes; Clostridia;Clostridiales; Peptococc/Acidaminococc; sf_11 17 17 Bacteria;Firmicutes; Clostridia; Clostridiales; Peptostreptococcaceae; sf_5 17 17Bacteria; Firmicutes; Clostridia; Clostridiales; Unclassified; sf_17 1617 Bacteria; Firmicutes; Unclassified; Unclassified; Unclassified; sf_816 17 Bacteria; Nitrospira; Nitrospira; Nitrospirales; Nitrospiraceae;sf_1 17 16 Bacteria; OP3; Unclassified; Unclassified; Unclassified; sf_416 17 Bacteria; Proteobacteria; Alphaproteobacteria; Acetobacterales;Acetobacteraceae; sf_1 17 16 Bacteria; Proteobacteria;Alphaproteobacteria; Azospirillales; Unclassified; sf_1 16 17 Bacteria;Proteobacteria; Alphaproteobacteria; Bradyrhizobiales;Beijerinck/Rhodoplan/Methylocyst; sf_3 17 17 Bacteria; Proteobacteria;Alphaproteobacteria; Bradyrhizobiales; Bradyrhizobiaceae; sf_1 17 17Bacteria; Proteobacteria; Alphaproteobacteria; Bradyrhizobiales;Hyphomicrobiaceae; sf_1 17 17 Bacteria; Proteobacteria;Alphaproteobacteria; Bradyrhizobiales; Methylobacteriaceae; sf_1 16 17Bacteria; Proteobacteria; Alphaproteobacteria; Ellin314/wr0007;Unclassified; sf_1 16 17 Bacteria; Proteobacteria; Alphaproteobacteria;Rhizobiales; Bradyrhizobiaceae; sf_1 16 17 Bacteria; Proteobacteria;Alphaproteobacteria; Rhizobiales; Phyllobacteriaceae; sf_1 17 17Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales;Unclassified; sf_1 16 17 Bacteria; Proteobacteria; Alphaproteobacteria;Rhodobacterales; Rhodobacteraceae; sf_1 17 17 Bacteria; Proteobacteria;Alphaproteobacteria; Rickettsiales; Unclassified; sf_1 17 17 Bacteria;Proteobacteria; Alphaproteobacteria; Sphingomonadales;Sphingomonadaceae; sf_1 17 17 Bacteria; Proteobacteria;Alphaproteobacteria; Sphingomonadales; Sphingomonadaceae; sf_15 17 17Bacteria; Proteobacteria; Alphaproteobacteria; Unclassified;Unclassified; sf_6 17 17 Bacteria; Proteobacteria; Betaproteobacteria;Burkholderiales; Alcaligenaceae; sf_1 16 17 Bacteria; Proteobacteria;Betaproteobacteria; Burkholderiales; Burkholderiaceae; sf_1 16 17Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales;Comamonadaceae; sf_1 16 17 Bacteria; Proteobacteria; Betaproteobacteria;Burkholderiales; Oxalobacteraceae; sf_1 17 17 Bacteria; Proteobacteria;Betaproteobacteria; Burkholderiales; Ralstoniaceae; sf_1 16 17 Bacteria;Proteobacteria; Betaproteobacteria; Methylophilales; Methylophilaceae;sf_1 16 17 Bacteria; Proteobacteria; Betaproteobacteria; Rhodocyclales;Rhodocyclaceae; sf_1 16 17 Bacteria; Proteobacteria; Betaproteobacteria;Unclassified; Unclassified; sf_3 17 17 Bacteria; Proteobacteria;Deltaproteobacteria; Syntrophobacterales; Syntrophobacteraceae; sf_1 1617 Bacteria; Proteobacteria; Epsilonproteobacteria; Campylobacterales;Campylobacteraceae; sf_3 17 17 Bacteria; Proteobacteria;Epsilonproteobacteria; Campylobacterales; Helicobacteraceae; sf_3 17 17Bacteria; Proteobacteria; Epsilonproteobacteria; Campylobacterales;Unclassified; sf_1 17 17 Bacteria; Proteobacteria; Gammaproteobacteria;Alteromonadales; Alteromonadaceae; sf_1 16 17 Bacteria; Proteobacteria;Gammaproteobacteria; Chromatiales; Chromatiaceae; sf_1 16 17 Bacteria;Proteobacteria; Gammaproteobacteria; Enterobacteriales;Enterobacteriaceae; sf_1 16 17 Bacteria; Proteobacteria;Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; sf_6 17 17Bacteria; Proteobacteria; Gammaproteobacteria; Legionellales;Unclassified; sf_1 17 17 Bacteria; Proteobacteria; Gammaproteobacteria;Legionellales; Unclassified; sf_3 16 17 Bacteria; Proteobacteria;Gammaproteobacteria; Pseudomonadales; Moraxellaceae; sf_3 16 17Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales;Pseudomonadaceae; sf_1 16 17 Bacteria; Proteobacteria;Gammaproteobacteria; Unclassified; Unclassified; sf_3 17 17 Bacteria;Proteobacteria; Gammaproteobacteria; Xanthomonadales; Xanthomonadaceae;sf_3 17 17 Bacteria; TM7; TM7-3; Unclassified; Unclassified; sf_1 16 17Bacteria; Unclassified; Unclassified; Unclassified; Unclassified; sf_14816 17 Bacteria; Unclassified; Unclassified; Unclassified; Unclassified;sf_160 17 17 Bacteria; Verrucomicrobia; Verrucomicrobiae;Verrucomicrobiales; Verrucomicrobiaceae; sf_7 17 17 Number ofsub-families detected in all samples over 17 week period 43 80 Italictext indicates sub-families not found in all 17 weeks. AU = Austin, SA =San Antonio.

TABLE S8 Bacterial sub-families containing pathogens of public healthand bioterrorism significance and their relatives that were detected inaerosols over the 17 week monitoring period. Austin San Antonio Weeks %of Weeks % of Pathogens and relatives taxon # detected weeks detectedweeks Bacillus anthracis Bacillus cohnii, B. psychrosaccharolyticus,3439 17 100.0 17 100.0 B. benzoevorans Bacillus megaterium 3550 11 64.712 70.6 Bacillus horikoshii 3904 9 52.9 14 82.4 Bacillus litoralis, B.macroides, B. 3337 5 29.4 8 47.1 psychrosaccharolyticus Staphylococcussaprophyticus, S. xylosus, S. 3659 7 41.2 15 88.2 cohnii Bacillusanthracis, cereus, thuringiensis, 3262 0 0.0 1 5.9 mycoides + othersRickettsia prowazekii - rickettsii Rickettsia australis, R.eschlimannii, R. typhi, 7556 2 11.8 5 29.4 R. tarasevichiae + othersRickettsia prowazekii 7114 0 0.0 0 0.0 Rickettsia rickettsii, R.japonica, R. honei + 6809 4 23.5 10 58.8 others Burkholderia mallei -pseudomallei Burkholderia pseudomallei, B. thailandensis 7870 10 58.8 1482.4 Burkholderia mallei 7747 10 58.8 8 47.1 Burkholderia pseudomallei,Burkholderia 8097 13 76.5 15 88.2 cepacia, B. tropica, B. gladioli, B.stabilis, B. plantarii + others Clostridum botulinum - perfringensClostridium butyricum, C. baratii, C. 4598 3 17.6 10 58.8 sardiniense +others Clostridium botulinum type C 4587 2 11.8 4 23.5 Clostridiumperfringens 4576 1 5.9 1 5.9 Clostridium botulinum type G 4575 3 17.6 741.2 Clostridium botulinum types B and E 4353 0 0.0 0 0.0 Francisellatularensis Tilapia parasite 9554 1 5.9 2 11.8 Francisella tularensis9180 0 0.0 0 0.0

TABLE S9 Distribution of array taxa among Bacterial and Archaeal phyla.Numbers of taxa in phylum Phyla represented on array ArchaeaCrenarchaeota 79 Euryarchaeota 224 Korarchaeota 3 YNPFFA 1 Archaeal taxasubtotal 307 Bacteria 1959 group 1 Acidobacteria 98 Actinobacteria 810AD3 1 Aquificae 19 Bacteroidetes 880 BRC1 3 Caldithrix 2 Chlamydiae 27Chlorobi 21 Chloroflexi 117 Chrysiogenetes 1 Coprothermobacteria 3Cyanobacteria 202 Deferribacteres 5 Deinococcus-Thermus 18 Dictyoglomi 5DSS1 2 EM3 2 Fibrobacteres 4 Firmicutes 2012 Fusobacteria 29Gemmatimonadetes 15 LD1PA group 1 Lentisphaerae 8 marine group A 5Natronoanaerobium 7 NC10 4 Nitrospira 29 NKB19 2 OD1 4 OD2 6 OP1 5 OP1012 OP11 20 OP3 5 OP5 3 OP8 8 OP9/JS1 12 OS-K 2 OS-L 1 Planctomycetes 182Proteobacteria 3170 SPAM 2 Spirochaetes 150 SR1 4 Synergistes 19 Termitegroup 1 6 Thermodesulfobacteria 4 Thermotogae 15 TM6 5 TM7 45Unclassified 329 Verrucomicrobia 78 WS1 2 WS3 7 WS5 1 WS6 4 Bacterialtaxa subtotal 8434 Total taxa 8741

EQUIVALENTS

The foregoing written specification is considered to be sufficient toenable one skilled in the art to practice the present embodiments. Theforegoing description and Examples detail certain preferred embodimentsand describes the best mode contemplated by the inventors. It will beappreciated, however, that no matter how detailed the foregoing mayappear in text, the present embodiments may be practiced in many waysand the present embodiments should be construed in accordance with theappended claims and any equivalents thereof.

The term “comprising” is intended herein to be open-ended, including notonly the recited elements, but further encompassing any additionalelements.

1. An array system comprising: a microarray configured to simultaneouslydetect a plurality of organisms in a sample, wherein the microarraycomprises fragments of 16s RNA unique to each organism and variants ofsaid fragments comprising at least 1 nucleotide mismatch, wherein thelevel of confidence of species-specific detection derived from fragmentmatches is about 90% or higher.
 2. The array system of claim 1, whereinthe plurality of organisms comprise bacteria or archaea.
 3. The arraysystem of claim 1, wherein the fragments of 16s RNA are clustered andaligned into groups of similar sequence such that detection of anorganism based on at least 11 fragment matches is possible.
 4. The arraysystem of claim 1, wherein the level of confidence of species-specificdetection derived from fragment matches is about 95% or higher.
 5. Thearray system of claim 1, wherein the level of confidence ofspecies-specific detection derived from fragment matches is about 98% orhigher.
 6. The array system of claim 1, wherein the majority offragments of 16s RNA unique to each organism have at least 1corresponding variant fragment comprising at least 1 nucleotidemismatch.
 7. The array system of claim 1, wherein every fragment of 16sRNA unique to each organism has at least 1 corresponding variantfragment comprising at least 1 nucleotide mismatch.
 8. The array systemof claim 1, wherein the fragments are about 25 nucleotides long.
 9. Thearray system of claim 1, wherein the sample is an environmental sample.10. The array system of claim 9, wherein the environmental samplecomprises at least one of soil, water or atmosphere.
 11. The arraysystem of claim 1, wherein the sample is a clinical sample.
 12. Thearray system of claim 1, wherein the clinical sample comprises at leastone of tissue, skin, bodily fluid or blood.
 13. A method of detecting atleast one organism comprising: applying a sample comprising a pluralityof organisms to the array system of claim 1; and identifying organismsin the sample.
 14. The method of claim 13, wherein the plurality oforganisms comprise bacteria or archaea.
 15. The method of claim 13,wherein the majority of fragments of 16s RNA unique to each organismhave at least 1 corresponding variant fragment comprising at least 1nucleotide mismatch.
 16. The method of claim 13, wherein every fragmentof 16s RNA unique to each organism has at least 1 corresponding variantfragment comprising at least 1 nucleotide mismatch.
 17. The method ofclaim 13, wherein the fragments are about 25 nucleotides long.
 18. Themethod of claim 13, wherein the at least one organism to be detected isthe most metabolically active organism or organisms in the sample.
 19. Amethod of fabricating an array system comprising: identifying 16s RNAsequences corresponding to a plurality of organisms of interest;selecting fragments of 16s RNA unique to each organism; creating variantRNA fragments corresponding to the fragments of 16s RNA unique to eachorganism which comprise at least 1 nucleotide mismatch; and fabricatingsaid array system.
 20. The method of claim 19, wherein the plurality oforganisms comprise bacteria or archaea.
 21. The method of claim 19,wherein the majority of fragments of 16s RNA unique to each organismhave at least one corresponding variant fragment comprising at least 1nucleotide mismatch.
 22. The method of claim 19, wherein every fragmentof 16s RNA unique to each organism has at least 1 corresponding variantfragment comprising at least 1 nucleotide mismatch.
 23. The method ofclaim 19, wherein the fragments are about 25 nucleotides long.